Parsing SVG paths in R

Parsing SVG paths in R



I'm trying to crack an R workflow for parsing SVG paths, using this file on this webpage. I'm encountering artifacts in the positioning of resulting polygons:



enter image description here



Some of the countries do not align with their neighbours - e.g. US/Canada, US/Mexico, Russia/Asian neighbours. Since the effect hits the countries with more complex polygons it seems likely to be a problem to do with cumulative summing, but I'm unclear where the problem lies in my workflow, which is:


nodejs



I reproduce the full workflow here using R (for US/Canada), with an external call to nodejs:


require(dplyr)
require(purrr)
require(stringr)
require(tidyr)
require(ggplot2)
require(rvest)
require(xml2)
require(jsonlite)

# Get and parse the SVG
doc = read_xml('https://visionscarto.net/public/fonds-de-cartes-en/visionscarto-bertin1953.svg')

countries = doc %>% html_nodes('.country')
names(countries) = html_attr(countries, 'id')
cdi = str_which(names(countries), 'CIV') # unicode in Cote d'Ivoire breaks the code
countries = countries[-cdi]

# Extract SVG paths and parse with node's svg-path-parser module.
# If you don't have node you can use this instead (note this step might be the problem):
# d = read_csv('https://gist.githubusercontent.com/geotheory/b7353a7a8a480209b31418c806cb1c9e/raw/6d3ba2a62f6e8667eef15e29a5893d9d795e8bb1/bertin_svg.csv')

d = imap_dfr(countries, ~
message(.y)
svg_path = xml_find_all(.x, paste0("//*[@id='", .y, "']/d1:path")) %>% html_attr('d')
node_call = paste0("node -e "var parseSVG = require('svg-path-parser'); var d='", svg_path,
"'; console.log(JSON.stringify(parseSVG(d)));"")
system(node_call, intern = T) %>% fromJSON %>% mutate(country = .y)
) %>% as_data_frame()


# some initial processing
d1 = d %>% filter(country %in% c('USA United States','CAN Canada')) %>%
mutate(x = replace_na(x, 0), y = replace_na(y, 0), # NAs need replacing
relative = replace_na(relative, FALSE),
grp = (command == 'closepath') %>% cumsum) # polygon grouping variable

# new object to loop through
d2 = d1 %>% mutate(x_adj = x, y_adj = y) %>% filter(command != 'closepath')

# loop through and change relative coords to absolute
for(i in 2:nrow(d2))
if(d2$relative[i]) # cumulative sum where coords are relative
d2$x_adj[i] = d2$x_adj[i-1] + d2$x_adj[i]
d2$y_adj[i] = d2$y_adj[i-1] + d2$y_adj[i]
else # code M/L require no alteration
if(d2$code[i] == 'V') d2$x_adj[i] = d2$x_adj[i-1] # absolute vertical transform inherits previous x
if(d2$code[i] == 'H') d2$y_adj[i] = d2$y_adj[i-1] # absolute holrizontal transform etc



# plot result
d2 %>% ggplot(aes(x_adj, -y_adj, group = paste(country, grp))) +
geom_polygon(fill='white', col='black', size=.3) +
coord_equal() + guides(fill=F)



enter image description here



Any assistance appreciated. The SVG path syntax is specified at w3 and summarised more concisely here.



Edit (response to @ccprog)



Here is data returned from svg-path-parser for the H command sequence:


svg-path-parser


H


code command x y relative country
<chr> <chr> <dbl> <dbl> <lgl> <chr>
1 l lineto -0.91 -0.6 TRUE CAN Canada
2 l lineto -0.92 -0.59 TRUE CAN Canada
3 H horizontal lineto 189. NA NA CAN Canada
4 l lineto -1.03 0.02 TRUE CAN Canada
5 l lineto -0.74 -0.07 TRUE CAN Canada



Here is what d2 looks like for same sequence after the loop:


d2


code command x y relative country grp x_adj y_adj
<chr> <chr> <dbl> <dbl> <lgl> <chr> <int> <dbl> <dbl>
1 l lineto -0.91 -0.6 TRUE CAN Canada 20 199. 143.
2 l lineto -0.92 -0.59 TRUE CAN Canada 20 198. 143.
3 H horizontal lineto 189. 0 FALSE CAN Canada 20 189. 143.
4 l lineto -1.03 0.02 TRUE CAN Canada 20 188. 143.
5 l lineto -0.74 -0.07 TRUE CAN Canada 20 187. 143.



Does this not look ok?. When I look at raw values for y_adj for H and previous rows they are identical 142.56.


H


142.56


d = imap_dfr(countries, ~
message(.y)
svg_path = xml_find_all(.x, paste0("//*[@id='", .y, "']/d1:path")) %>% html_attr('d')
node_call = paste0("node -e "var parseSVG = require('svg-path-parser'); var d='", svg_path,
"'; console.log(JSON.stringify(parseSVG.makeAbsolute(parseSVG(d))));"")
system(node_call, intern = T) %>% fromJSON %>% mutate(country = .y)
) %>% as_data_frame() %>%
mutate(grp = (command == 'moveto') %>% cumsum)

d %>% ggplot(aes(x, -y, group = grp, fill=country)) +
geom_polygon(col='black', size=.3, alpha=.5) +
coord_equal() + guides(fill=F)






I've also submitted this to the svg-path-parser module on github

– geotheory
Sep 9 '18 at 9:57






I'm not familiar with R, but to me it looks like you divide the path into groups by looking for closepath commands, and then take the first moveto in each group as starting point to cumulate positions from for the conversion to absolute. Two sources of errors are:1. moveto commands, apart from the first one, can also be relative (to the last coordinate of the previous group). 2. Groups must not be closed with a closepath command. Searching for the opening moveto would be more reliable.

– ccprog
Sep 9 '18 at 15:42


closepath


moveto


moveto


closepath


moveto






Hi @ccprog. I do use closepath to create variable grp (that identifies unique polygons), but it does not have any role in parsing the actual coordinates. In fact I just use the SVG relative field which as I understand specifies when coordinates are relative or absolute. With absolute codes you have to also account for H/V commands, which inherit the inactive coordinate from the previous point.

– geotheory
Sep 9 '18 at 15:54


closepath


grp


relative


H


V




1 Answer
1



Look at your rendering of Canada, especially the southern coast of the Hudson sound. There is a very obvious error. Sieveing through the path data, I found the following sequence in the original data:


h-2.28l-.91-.6-.92-.59H188.65l-1.03.02-.74-.07-.75-.07-.74-.07-.74-.06.88 1.09



I've loaded your rendering result into Inkscape, and drawn the relevant part of the path on top, the arrow marking the segment drawn by the absolute H command. (The z command has been removed, that is the reason for the missing segment.) It is obvious that somewhere in there a segment is too long.



enter image description here



It turns out the absolute H corrects the previous (horizontal) error. Look at the preceding point: it is 198., 143., but it should be 191.76,146.07. The vertical error remains at about -3.6.


H


198., 143.


191.76,146.07



I've made a codepen that overlays the original path data with your rendering as precisely as possible. The path data have been divided into the (single-polygon) groups and converted to absolute by Inkscape. Unfortunately, the program cannot convert them to polygon primitives, so there are still V and H commands in there.



It shows this:


group0



I've made some visual measurements of that deviation (error ~0.05), and they ultimately give the clue:


group01: 0.44,-0.73
group02: 0.84,-1.12
group03: 2.04,-1.44
group04: 2.94,-1.73
group05: 2.60,-1.86
group06: 3.14,-2.38
group07: 3.68,-2.54
group08: 4.03,-3.35
group09: 4.87,-2.97
group10: 6.08,-3.50 (begin)
group10: 0.00,-3.53 (end)
group11: 1.08,-1.95
group12: 2.05,-2.45
group13: 2.89,-2.84
group14: 3.64,-3.67
group15: 4.48,-3.44
group16: 4.04,-3.99
group17: 4.32,-3.08
group18: 4.75,-2.75
group19: 5.72,-2.95
group20: 5.40,-3.11
group21: 6.02,-2.95
group22: 6.63,-4.14
group23: 6.85,-5.00
group24: 7.14,-4.86
group25: 7.72,-4.39
group26: 8.65,-4.75
group27: 9.49,-4.39
group28: 10.20,-4.44
group29: 11.13,-4.58



You are removing the closepath commands, and then compute the first point of the next group relative to the last explicit point of the last group. But closepath actually moves the ccurrent point: back to the position of the last moveto command. These may, but need not be identical.


closepath


closepath


moveto



I can't give you a ready script in R, but what you need to do is this: at the beginning of a new group, cache the position of the first point. At the beginning of the next group, compute the new first point relative to that cached point.






Thanks for help ccprog. Good to focus on specifics like this. So it's true I set the NA values for y variable of H commands to zero. But later I override that with if(d2$code[i] == 'H') d2$y_adj[i] = d2$y_adj[i-1] - basically inherit previous y value. I include relevant data.frame sections for d and d2 - added to question.

– geotheory
Sep 9 '18 at 17:14


y


H


if(d2$code[i] == 'H') d2$y_adj[i] = d2$y_adj[i-1]


y


d


d2






I understand, but I am absolutely sure the culprit is that absolute H command. I've added a screenshot to prove my point.

– ccprog
Sep 9 '18 at 17:42






No, the absolute H corrects the (horizontal) error. Look at the preceding point: it is 198., 143., but it should be 191.76,146.07. The vertical error remains. If, in addition, I account for the vertical error and move your rendering up by dy=-3.6, the very first point of the path data matches. As far as I can judge, all other path groups are internally consistent, but the further down in the path data they are, the more they are off to the bottom left.

– ccprog
Sep 9 '18 at 19:08



198., 143.


191.76,146.07






This is going to take some thinking. Will come back shortly.

– geotheory
Sep 9 '18 at 20:07






I've made a codepen that overlays the original path data with your rendering as precisely as possible. The path data have been divided into the (single-polygon) groups and converted to absolute by Inkscape. Unfortunately, the program cannot convert them to polygon primitives, so there are still V and H commands in there.

– ccprog
Sep 9 '18 at 20:11



Thanks for contributing an answer to Stack Overflow!



But avoid



To learn more, see our tips on writing great answers.



Required, but never shown



Required, but never shown




By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

ữḛḳṊẴ ẋ,Ẩṙ,ỹḛẪẠứụỿṞṦ,Ṉẍừ,ứ Ị,Ḵ,ṏ ṇỪḎḰṰọửḊ ṾḨḮữẑỶṑỗḮṣṉẃ Ữẩụ,ṓ,ḹẕḪḫỞṿḭ ỒṱṨẁṋṜ ḅẈ ṉ ứṀḱṑỒḵ,ḏ,ḊḖỹẊ Ẻḷổ,ṥ ẔḲẪụḣể Ṱ ḭỏựẶ Ồ Ṩ,ẂḿṡḾồ ỗṗṡịṞẤḵṽẃ ṸḒẄẘ,ủẞẵṦṟầṓế

⃀⃉⃄⃅⃍,⃂₼₡₰⃉₡₿₢⃉₣⃄₯⃊₮₼₹₱₦₷⃄₪₼₶₳₫⃍₽ ₫₪₦⃆₠₥⃁₸₴₷⃊₹⃅⃈₰⃁₫ ⃎⃍₩₣₷ ₻₮⃊⃀⃄⃉₯,⃏⃊,₦⃅₪,₼⃀₾₧₷₾ ₻ ₸₡ ₾,₭⃈₴⃋,€⃁,₩ ₺⃌⃍⃁₱⃋⃋₨⃊⃁⃃₼,⃎,₱⃍₲₶₡ ⃍⃅₶₨₭,⃉₭₾₡₻⃀ ₼₹⃅₹,₻₭ ⃌