Digitising a PDF

The task

I was recently asked to take a PDF plan of a quarry that was produced in 2007 and turn it back into a GIS or CAD type format. The backstory was that the surveying company that did the site survey and produced the plan no longer had access to the original data. It was 15 years ago, so I guess that’s not shocking. I believe the end goal is to produce a DTM surface for use in volume calculations, but I’m yet to have that confirmed.

The (PDF) plan I was supplied was a 1:2000 scale (on A3) detail survey of the site. Being a presentation plot it has five title blocks, a north arrow, various labels and most significantly for this exercise, 1m contours which range in elevation over a total of about 48m. In the area of interest, only two elevations were considered index contours (with a slightly heavier pen stroke), and these were only 5m apart (from each other) and both at the upper end of the elevation range. In other words, the deeper ~30m of contours were all unlabelled and indistinguishable from each other visually.

I had sufficient additional information to georeference the plan, even though the best version of it I was supplied had no grid or coordinates on it. I was supplied a second version with a grid, but it was at 1:6000 scale so it was even worse.

Attempt 1

Initially, I georeferenced the PDF into QGIS using the built in Georeferencer with a Helmert transformation. This produces a raster. Next, I started digitising the contours in a separate layer, working from the heavier index contours I could make out. This worked okay for a bit, but in steeply sloping areas the contours overlap each other and become visually indistinct, and I genuinely could not figure out a series of significant elevations in the deeper parts of the excavation. This was a problem.

Attempt 2

Inspecting the PDF in a viewer, I could see it was a vector PDF. When zooming in, the features were redrawn sharply. However, the pen stroke was such that even at maximum zoom the features all blended into each other. Nevertheless, it pointed a way forward. Now I’m sure there are other ways to do this, but I needed a method for getting my data from the PDF into QGIS with free software. Here’s what I did.

1. pdftocairo

First, I used pdftocairo in Ubuntu on Windows (WSL) to turn the PDF into an SVG (scalable vector graphics) format.

2. Inkscape

Next, I opened the SVG in Inkscape, and exported it as a DXF.

3. QGIS

Next, I opened the DXF in QGIS. Everything in the document (texts, title blocks, contours) came in as lines. That was fine with me. But of course the scale, rotation and translation were all wrong and there was no CRS assigned. I was able to correct those with the transform tools (specifically, affine transform) once I figured out the scaling factors. I saved the results in a Geopackage, because it’s 2022 and Shapefiles are so 1997.

Unfortunately the contours came out as dashed lines in a billion separate segments so I didn’t have an easy way so assign elevations, so I ended up digitising new segments over the top using the snapping tools (actually the layer had ~135k features). So while still very manual I at least had an ability to zoom in on features and have them resolve nicely, which meant I could distinguish between very closely spaced features and determine the elevations for the entire site by carefully working away from the known areas. The result actually came out really well, better than I would have thought was possible before I started.

In summary, the data I worked on went something like this:

Some raw surveying format > AutoCAD > PDF > …brew for 15 years… > SVG > DXF > Geopackage > whatever comes next.

The moral of the story is:

Backup your data
Free software is great
Export your PDFs as vector where possible