Harmonizing population grid data into the INSPIRE data model

On October 24th at the European Forum for Geography and Statistics conference in Kraków, Poland Pieter Bresters from Statistics Netherlands presented his paper "Harmonizing population grid data into the INSPIRE data model". Pieter raised some issues regarding the transformation itself as well as the usefulness of the resulting GML datasets.

Population Distribution (PD) theme has been designed without geometry of its own having in mind that it will always need Statistical Units (SU) spatial data to be disseminated. GML files created for PD therefore contain no geometry for each statistical value, only a thematic identifier of the respective statistical unit. Importing these GMLs to ArcGIS (with Data Interoperability) or QGIS worked partly or not at all. The workaround was to import the GML to MS Excel as a table and then reference it to SU geometry. This means that harmonized data is less useful than data before harmonization:

The whole paper can be found here:

Statistics Poland has also published harmonized PD data - total population in statistical regions and census enumeration areas (SU units lower than LAU2 level). Besides difficulties mentioned by Pieter we identified another problem - GML files of very large size cannot be processed by computers. Therefore we had to divide the dataset into about 2500 GML files - one for each municipality (LAU2). We also received lots of user inquiries on how to view the data in GIS software (same issues as those raised by Pieter). For now we decided to publish this PD dataset also in an ESRI Shapefile to make it more usable.

We would like to hear if anyone has encountered similar issues with SU & PD data. 

    Hello Miroslaw,

    Thank you for setting up this site and for mentioning my paper!

    My first suggestion is to change the PD model in a way that StatisticalValues become "feature types" instead of just "types". In this way we can give the PD-datasets geometry, which was my greatest concern in the mentioned paper. For HH this is already the case.

    Hello Pieter,

    Thank you for your comment. I agree that StatisticalValue should be upgraded to featureType instead or next to the StatisticalDistribution. We were thinking recently of implementing our themes (SU & PD) as WFS and in PD it doesn't make much sense, cause with StatisticalDistribution as a featureType, the whole distribution (= dataset) would be downloaded in each case. The feature access in WFS gives therefore no value added over predefined dataset download services (e.g. ATOM).

    If you bring geometry into PD than you have twice the same geometry (one in SU and one in PD). To me, the best is to merge PD and SU in one theme. So doing, you population data become atribute of SU



    SU was meant as a separate theme to allow its geometry to be used by other themes (as it is in the case of Human Health package of the Human Health and Safety [HH] data theme). Therefore I don't think it is a good idea to merge PD into SU. However - as stated above - what HH has and PD has not is feature types for seperate statistical values (e.g. a measure of a disease or health services statistics) and that in my opinion is the direction the PD specification should take.

