Skip to content

Add AutoCAD DXF data source#24

Open
ghanse wants to merge 1 commit intodatabricks-industry-solutions:mainfrom
ghanse:ghanse/issue-23-dxf-data-source
Open

Add AutoCAD DXF data source#24
ghanse wants to merge 1 commit intodatabricks-industry-solutions:mainfrom
ghanse:ghanse/issue-23-dxf-data-source

Conversation

@ghanse
Copy link
Copy Markdown
Collaborator

@ghanse ghanse commented Apr 1, 2026

Changes

This PR introduces the following changes:

  • Implements a PySpark Python Data Source for reading AutoCAD DXF files using the ezdxf library
  • Extracts geometric entities (LINE, CIRCLE, ARC, TEXT, LWPOLYLINE, ELLIPSE, SPLINE, INSERT, POLYLINE, MTEXT, DIMENSION, HATCH, POINT) into a tabular schema with file_path, entity_type, layer, handle, and attributes (JSON) columns
  • Supports layerFilter option for read-time layer filtering, recursiveFileLookup, configurable numPartitions, and pathGlobFilter
  • Includes 13 unit tests covering path handling, entity extraction, attribute parsing, layer filtering, and partition logic
  • Adds sample DXF test fixture, README, Makefile, and requirements.txt

To-Do

Implement a PySpark Python Data Source for reading DXF (Drawing Exchange
Format) files. Extracts geometric entities (LINE, CIRCLE, ARC, TEXT,
LWPOLYLINE, ELLIPSE, SPLINE, INSERT, etc.) into a tabular schema with
entity type, layer, handle, and JSON attributes. Supports layer filtering,
recursive file lookup, and configurable partitioning.

Closes databricks-industry-solutions#23

Co-authored-by: Isaac
@ghanse ghanse self-assigned this Apr 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant