European Commission logo
INSPIRE Community Forum

Population distribution : how to export large files with HALE?

Hi,

After having transformed my population distribution data (32.000 records) according to statisticaldistribution schema with HALE, the function "export transformed data" is inactive. When I transform a subset of 20.000, then 15.000, then 10.000 records,it remains inactive. The export function is working when I use a subset of 5000 records.

How can I do to have my export function working with 32.0000 records?

Thank you

Pierre Jamagne

  • Mirosław MIGACZ

    By Mirosław MIGACZ

    Dear Pierre,

    And what is the size of the exported GML with 5000 records?

    Are you trying to export data for the whole area of the country or divided to lower administrative units?

    In order for the GMLs to be readable (file size) we had to publish PD data divided into municipalities (around 2500 datasets for Poland). Otherwise the GML would be too large.

    Regards,

    Mirosław

  • Simon TEMPLER

    Hi Pierre,

    from the description of the problem and the fact that this is the Population Distribution schema I can only guess that this might be a memory issue. Is the memory consumption bar in the status bar at the bottom of the window 90 to 100% all the time?
    HALE only keeps the objects in memory that are currently transformed, and for "normal" application schemas that works quite well, because a single feature is not that big. What may be the problem in your case may be, that in the Population Distribution schema you have to fit all data into a single Statistical Distribution object. That means all the data has to fit into the memory assigned to HALE at once.

    The working memory assigned to HALE by default is 800MB - you should increase that value and test again. You can change the value in the HALE.ini file (e.g. -Xmx2048m instead of -Xmx800m). The maximum value you can use here depends on your system (e.g. if it's 32 or 64 bit).
    What also improves memory management is if you do the transformation not from within the HALE desktop application but use the command line interface for transformation. In that case none of the memory is required for the User Interface. If you use HALE with the user interface, it may help to close the map perspective and window.

    Hope that helps

    Best,
    Simon

  • Pierre JAMAGNE

    By Pierre JAMAGNE

    Dear Simon and Miroslaw,

    32.000 records source  file :  I tried again with Xmx800m replaced by Xmx6000m  but without being able to have the export transformed data activated, thus not being able to produce any gml file from my transformed  file.

    5.000 records source file : the resulting gml file is 8500 KB large

    I did not find the way to use the command mode, How can I reach it?

    Thank you

    P Jamagne

     

  • Simon TEMPLER

    Hi Pierre,

    were you able to verify that this is a memory issue? In HALE, is the memory indicator near 100% and not moving much when the transformation goes stale?

    Documentation on running transformation via the command line can be found in the HALE help.

    Best,
    Simon

  • Pierre JAMAGNE

    By Pierre JAMAGNE

    Hi Simon,

    on the lower right side of my screen I have something like 1221M of 2858 or instance but varying. It varies between 20 and 80 %image

     

    P Jamagne

     

  • Simon TEMPLER

    Hi Pierre,

    that doesn't sound like it is a memory issue, at least with the increased memory setting you made (then there should be not much movement any more and the memory should be close to the limit you define). Without the project and data at hand it's hard to judge what could be the issue otherwise. Did you try letting it run for some time and see if it terminates on its own? Are there any other indicators that something is going wrong? Is maybe a hard disk drive full?

    Best,
    Simon

  • Pierre JAMAGNE

    By Pierre JAMAGNE

    Dear Simon an Miroslaw,

    I redid the exercice at home with my personal laptop and everything went allright. I could export without problem my 32.000 records. Thus the failure is linked to my office PC.

    Another question arise : when you split the total population variable into male and female, do you have to record the total population variable? If yes, Where do you have to record the title of this variable?

    Pierre Jamagne

  • Mirosław MIGACZ

    By Mirosław MIGACZ

    Dear Pierre,

    According to the PD spec there is no need to record the title for total population: "The classification attribute can be used to represent a set of distributions with respect to the way of splitting the statistical values into different groups using one or several classifications of the individuals according to their characteristics. No classification may happen for the “Total” values."

    Example from one of our GMLs:

                <pd:value>
                    <pd:StatisticalValue>
                        <pd:value>1399490.0</pd:value>
                        <pd:status xlink:href="http://inspire.ec.europa.eu/codeList/StatisticalDataStatusValue/final&quot; xlink:title="finalna"/>
                        <pd:dimensions>
                            <pd:Dimensions>
                                <pd:spatial xlink:href="http://nts.stat.gov.pl/2.5.02/2013"/&gt;
                                <pd:thematic>
                                    <pd:ClassificationItem>
                                        <pd:type xlink:href="http://inspire.ec.europa.eu/codelist/GenderValue/male&quot; xlink:title="męska"/>
                                    </pd:ClassificationItem>
                                </pd:thematic>
                            </pd:Dimensions>
                        </pd:dimensions>
                    </pd:StatisticalValue>
                </pd:value>
                <pd:value>
                    <pd:StatisticalValue>
                        <pd:value>2909997.0</pd:value>
                        <pd:status xlink:href="http://inspire.ec.europa.eu/codeList/StatisticalDataStatusValue/final&quot; xlink:title="finalna"/>
                        <pd:dimensions>
                            <pd:Dimensions>
                                <pd:spatial xlink:href="http://nts.stat.gov.pl/2.5.02/2013"/&gt;
                            </pd:Dimensions>
                        </pd:dimensions>
                    </pd:StatisticalValue>
                </pd:value>
                <pd:value>
                    <pd:StatisticalValue>
                        <pd:value>1510507.0</pd:value>
                        <pd:status xlink:href="http://inspire.ec.europa.eu/codeList/StatisticalDataStatusValue/final&quot; xlink:title="finalna"/>
                        <pd:dimensions>
                            <pd:Dimensions>
                                <pd:spatial xlink:href="http://nts.stat.gov.pl/2.5.02/2013"/&gt;
                                <pd:thematic>
                                    <pd:ClassificationItem>
                                        <pd:type xlink:href="http://inspire.ec.europa.eu/codelist/GenderValue/female&quot; xlink:title="żeńska"/>
                                    </pd:ClassificationItem>
                                </pd:thematic>
                            </pd:Dimensions>
                        </pd:dimensions>
                    </pd:StatisticalValue>
                </pd:value>

    For male and female population we used ClassificationItem, for the total population (the middle value in the example) there is no classification.

  • Pierre JAMAGNE

    By Pierre JAMAGNE

    Simon, Miroslaw

    The transformation of my population distribution has been finally achieved when I splitted my source file into 11 provinces. The largest resulting gml file for a province is 9.000 KB. For the provinces with large number of records (around 2000-2500 rec) I had to adjust hale.ini in such a way the work memory has been brought to 8000 and also had to reinstall Hale.

    For the grid population, my resulting gml file for the grid population (32.000 rec) is 120.000 KB.

    So I think it would be nice if there could be an improvement in Hale to allow such kind of source file transformed in one single file and not 11 files.

    Anyway thank you for your valuable help

    Pierre Jamagne

This discussion is closed.

This discussion is closed and is not accepting new comments.

Statistics & Health

Statistics & Health

Join this group if you would like to share knowledge or ask questions regarding the INSPIRE implementation of Statistical Units [SU], Population Distribution (Demography) [PD] or Human Health and Safety [HH] data themes