GenCAD: MIT's Image-to-CAD Model and the Early Adopter Window

GenCAD sketch and image inputs converted to 3D CAD parts

Researchers Ferdous Alam and Mohammed Ahmed at MIT open-sourced GenCAD, an image-conditioned generative model that produces parametric CAD command sequences. Feed it a hand-drawn sketch or a single rendered photo of a part, and the model outputs the CAD program that builds it. The repo exports an STL today. The program underneath is the part to care about.

Paper: TMLR 2025, arXiv 2409.16294. Repo: github.com/ferdous-alam/GenCAD. Project page: gencad.github.io.

What GenCAD actually is

Three models, one pipeline:

CSR. An autoregressive transformer that learns latent representations of CAD operation sequences. Sketch, extrude, the lot.
CCIP. A CLIP-style contrastive model that maps images into that CAD latent space.
Diffusion prior. Samples a CAD sequence from the image embedding. A decoder turns the latent back into commands a geometry kernel can execute.

The training corpus is roughly 7,000 CAD programs. The output is parametric, meaning the operations themselves are recoverable, not just the mesh. That distinction is what makes the program editable in principle, even if today's repo exports STL by default.

What it does today (honest read)

Look at the demo grid on the project page. Two rows of sketches, two rows of CAD output. The parts are brackets, mounting plates, hex nuts, simple housings. The kind of part a junior CAD engineer drafts in twenty minutes.

That is the scope today. GenCAD is not modelling a robot arm. It is not modelling a turbine blade. The viral tweets claiming MIT killed the CAD industry are wrong on the timeline.

What's actually shipped:

Sketch-conditioned generation and photo-conditioned generation
Top-3 retrieval from the 7k-program collection
STL export with a bundled stl2img.py helper for renders

What's not in the repo yet:

Quantitative evaluation. The README literally marks the metrics section as "coming soon"
Parametric file export (STEP, BREP). You get a mesh. The ops are in the latent, the export pipeline goes to STL
A web UI or hosted demo. Local only

Why the early adopter window matters

The honest comparison isn't to SolidWorks today. It's to what image-conditioned generative models looked like in 2022, the year before Midjourney v4. Slop, then suddenly not slop. The architecture here, transformer plus contrastive plus diffusion prior, is the same recipe that scaled image generation.

If GenCAD follows the same curve, the engineers who already know what a CAD program looks like, what a parametric extrude is, how to clean up an STL, are the ones who pull useful work out of v2 the day it ships. Everyone else is back at the bottom of the learning curve.

That's the window. Not because GenCAD replaces anything today. Because hands-on experience now compounds into leverage later.

Setup at a glance

Full instructions live in the repo. This is the shape, so you can decide if you can run it tonight.

You need:

A CUDA-capable GPU. The Docker image is built around CUDA plus Xvfb for headless rendering
Docker, or a Python 3.10 plus conda environment with pythonocc-core 7.9.0
Patience to download the dataset and checkpoints from Google Drive

Steps in shape:

git clone https://github.com/ferdous-alam/GenCAD
Build the Docker image, or follow the manual conda path
Download the CAD and sketch embeddings plus pretrained weights from the linked Google Drive
Run inference on a sample sketch or rendered image
View the generated STL with the bundled stl2img.py script

If you don't already have Docker and a CUDA GPU running, this is an evening of setup before you generate your first part. Worth doing if the research direction matters to you. Skip if you wanted a SaaS button.

Honest limitations

Scope is simple parts. Brackets, plates, basic primitives. Not assemblies. Not anything with subtle tolerances.
Output is STL. Mesh. You can't drop it into SolidWorks and edit the fillet radius. The parametric ops exist in the latent, but the export pipeline today goes straight to mesh.
No UI. Docker plus GPU plus CLI. Researcher artefact, not a product.
Dataset is small. 7k parts is enough for a proof, not enough to generalise to the long tail.
License unspecified in the README at time of writing. Check the repo before using output commercially.

Why this matters

Image-to-3D research has been having a moment for two years. Almost all of it produces meshes, NeRFs, or Gaussian splats, which are great for visualisation and useless for manufacturing. GenCAD is one of the first open-sourced models that outputs the actual program a CAD kernel can run.

Mesh-out tools compete with Blender. Program-out tools compete with SolidWorks. Different industry, different unit economics. Today's GenCAD demos look like greybox parts. Tomorrow's version, if the dataset grows ten times and the architecture scales, looks like something a manufacturer can actually use.

The repo is the receipt. Star it, clone it, run it before everyone else figures out what the program-out distinction means.