European Commission logo
INSPIRE Community Forum

On line validation of datasets

Giacomo MARTIRANO
By Giacomo MARTIRANO Replies (11)

In the context of the European project eENVplus (the "project hosted in Italy" mentioned by Peter) we developed a Validation Service consisting of an implementation for the ATS (Abstract Test Suite) included in the Annex A of INSPIRE Data Specifications. This service makes use of the OGC free testing facility GML 3.2 (ISO 19136:2007).

This executable test suite (ETS) verifies the conformance of GML datasets with respect to INSPIRE application schemas and also with respect to ISO 19136:2007 (GML 3.2.1).

Supplementary INSPIRE constraints can be verified making use of theme specific schematron files.

For those tests that cannot be automated, the ETS contains guidelines to manual execution.

For the time being the full ETS (including schematron file and guidelines) is available for PS theme.

The validation against the application schema is available on-line for all the other data themes and for most of them interfaces explaining the INSPIRE ATS context in which the validation is performed are provided.

Exploiting the Team Engine functionalities, apart from a local resource, it is possible to upload the GML dataset file as web resource, inserting the http URL or the relevant WFS GetFeature request.

The Test, Evaluation, And Measurement (TEAM) Engine, the official test harness used by OGC Compliance Program, and the GML testing facility have been:

  • checked out from GitHub OGC repositories ((TEAM Engine version 4.0.5 – GML Suite release r17)
  • installed on cloud server
  • customized (in terms of user interface)
  • enriched with theme-specific schematron rules provided by the eENVplus team

The work, still in progress, is reported within the MIG WG5.

Access the service:

Grateful to those of you willing to send any feedback.

 

  • Stefania MORRONE

    By Stefania MORRONE

    The eENVplus Validation Service team makes available updated Guidelines to Manual Validation of those tests that cannot be automated in PS ETS.

    Further details on the automated tests can be found here.

    More details on schematron files produced by eENVplus team are available here.

  • Iurie MAXIM

    Following the tests performed by Stefania on the N2K Romanian GML we found some isues in the online validator, that are good to be known.

    We tested the WFS 2.0 download service available at  http://inspire.biodiversity.ro/WFS/RO_ENV_PS/wfs?service=wfs&version=2.0.0&request=GetFeature&TYPENAME=ps:ProtectedSite by using the online validator mentioned in the previous post.

    ISSUE 1:

    We noticed that the validator provided only one error, namely it indicated that we provided for category V IUCN the value 'ProtectedLandscapeOrSeascape', while according to the validator it expected the value 'protectedLandscapeOrSeascape', so we used 'P' instead of 'p' for the first letter for naming the IUCN category V. The error description sounded like:

    ERROR DESCRIPTION: Protected sites must be labeled according to codelists ! Erroneous designation value ' ProtectedLandscapeOrSeascape ' found for the IUCN designation schema. .......(ps:designation='protectedLandscapeOrSeascape') 

    We modified  'P' with 'p' and now the validator shows no errors. Now the WFS is provided with lowercase letter, but in future probably will switch again to upercase letter, based on the discussions that will follow on this topic.

    Hoever "corect" is with 'P' and not with 'p' according to:

    - article 52 of EC Regulation 102/2001 that ammends the EC Regulation 1089/2010 
    http://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:32011R0102&from=EN
    - Current Technical Guidelines for Protected Sites, page 95 (83) http://inspire.ec.europa.eu/documents/Data_Specifications/INSPIRE_DataSpecification_PS_v3.2.pdf
    - INSPIRE Registry http://inspire.ec.europa.eu/codelist/IUCNDesignationValue/ProtectedLandscapeOrSeascape

    It is true that better is to correct all the above regulations, guidelines and registries as more corect is with small letter to take into consideration that all other codelist values are writen in lowerCamelCase except for acronims.

    ISSUE 2:

    Folowing the discutions regarding the provision of empty geometry for some sites in CDDA, we provided the site RONPA0968 with no geometry, but only with atributes, as this is the only protected area with unknown boundaries (not identified in the field). Even if the geometry is not voidable according to the article 9.1.1. of the EC Regulation 1089/2010 and according to the current Technical Guidelines for Protected Sites, the validator didn't triggered any error.

    The site without geometry can be retreived by the following request to the WFS 2.0 download service

    http://inspire.biodiversity.ro/WFS/RO_ENV_PS/wfs?service=wfs&version=2.0.0&request=GetFeature&TYPENAME=ps:ProtectedSite&featureId=RONPA0968

    ISSUE 3:

    As the only error found by Stefania is in relation to the fact that we included the up to date XMLSchema.xsd, in the list of the schemas to be used, is really correct that the http://inspire.ec.europa.eu/schemas/base/3.2/BaseTypes.xsd is pointing to the older 'https://www.w3.org/2001/XMLSchema.xsd' schema ‚Äčinstead of the newer, continuously updated 'http://www.w3.org/2009/XMLSchema/XMLSchema.xsd' schema, or there is a reason for pointing to the older, outdated schema?

    Iurie

  • Stefania MORRONE

    By Stefania MORRONE

    Hi, thank you for reporting.

    Just few considerations:

    Issue 1:

    Since  “code list means an open enumeration” (see CR (EU) No 1089/2010) and for the enumeration “the attribute name conforms to the rules for attributes names, i.e. is a lowerCamelCase name..”  - see DS section 5.2.3 - code list values starting with Uppercase letter (hence ProtectedLandscapeOrSeascape) do not conform to DS requirement.

    A change in the documents/registry should therefore be proposed.

    Said that, I agree with you that there’s an inconsistency between value tested and value that can be found in the registry .So,based on the discussions that will follow on this topic, we consider the possibility to change the test value from ‘protectedLandscapeOrSeascape’ ‘to ‘ProtectedLandscapeOrSeascape’.

    Issue 2:

    It is a tricky issue.

    When I display your polygon I find

     <ps:geometry>

    <gml:MultiSurface gml:id="Geom.RO.ENV.PS.RONPA0968"/>

    </ps:geometry>

    From a formal point of view (conformance to the schema) there is no error, because the geometry element is present and not void.

    Could this be the searched workaround for CDDA empty geometry?

    Issue 3

    I agree that the most recent should be preferred, provided that it validates with no errors.

    To be more precise:

    • when validating the http://www.w3.org/2009/XMLSchema/XMLSchema.xsd  with Oxygen, the following 2 error messages appear "
      rcase-Recurse.2: There is not a complete functional mapping between the particles.
    • derivation-ok-restriction.5.4.2: Error for type 'all'.  The particle of the type is not a valid restriction of the particle of the base."
    • When validating the  http://www.w3.org/2009/XMLSchema/XMLSchema.xsd with Altova XML Spy the following message appears "Attribute 'xpathDefaultNamespace' is not allowed in element <xs:schema>"
    • When validating your RO.ENV.PS.N2k.gml with OGC GML test suite the error message is "
      Test method compileXMLSchema: 
                    2 schema error(s) detected.
       
      Severity: ERROR
      Message: rcase-Recurse.2: There is not a complete functional mapping between the particles.
      Location:  line=969 column=30
      Severity: ERROR
      Message: derivation-ok-restriction.5.4.2: Error for type 'all'.  The particle of the type is not a valid restriction of the particle of the base.
      Location:  line=969 column=30 expected [false] but found [true]

    I also tried to import the http://www.w3.org/2009/XMLSchema/XMLSchema.xsd  with HALE (as target schema) but import fails due to schema error.

    Could this explain the reason why the INSPIRE xsds and the http://schemas.opengis.net/wfs/2.0/wfs.xsd refer the older 2001 version? 

  • Iurie MAXIM

    Hi Stefania,

    Thank you too for the detailed reply. My coments can be found below:

    ISSUE 1:

    Clearly as I mentioned before and as you confirm, best is to correct the regulation (probably not easy), technical guideline for PS, the INSPIRE registry for IUCN Categories. However till at least one of the above three references are corrected, the P must be used instead of p for 'ProtectedLandscapeOrSeascape', at least in the official EC future validator.

    Do you know the exact procedure to modify the regulation, tehnical guideline and the registry? We have some issues related to enumerations and codelists in our language, as the Directive was wrongly translated in Romanian language and even more it is using turkish special characters instead of romanian special characters. Therefore all codelists derived from the text of directive in Romanian language are wrong, as for example the names of Spatial Data Themes.

    ISSUE 2

    Indded it is a workaround for the CDDA issue, allowing MS to provide in such way null geometry even if the geometry is not voidable. However, I think that the validation of data and services must be more complex in order to triger at least a warning if not an error if such tricks are made. I worked for ETC/BD with Brian Mac Sharry checking the MS reported data and we did not allowed any such tricks, and in the QA/QC report to MS we noted with sentences that would sound like "No coordinates of the geometery of site boundary were provided for the following sites ....... This can be the case only if the protected site boundary is not known at all or if the location of the site is sensitive data for protection of rare species according to art 13.1(h) of INSPIRE Directive. Otherwise the coordinates of the protected sites boundaries must be provided"

    ISSUE 3:

    I checked which are the differences between the http://www.w3.org/2009/XMLSchema/XMLSchema.xsd and https://www.w3.org/2001/XMLSchema.xsd' . Aaccording to https://www.w3.org/2001/XMLSchema they are versions 1.1 and 1.0 of the XML schema. Differences between the 1.0 and 1.1 versions of the XML schemas are explained on the internet, in articles such as https://blogs.oracle.com/rammenon/entry/xml_schema_11_what_you_need_to 

    The  http://schemas.opengis.net/wfs/2.0/wfs.xsd WFS 2.0 and http://schemas.opengis.net/gml/3.2.1/gml.xsd GML 3.2.1 schemas are using version 1.0 XML. 

    Key Changes since 1.0

    The changes since version 1.0 till date can be grouped into the following.

    • Validation
      • Rule based Validation - Love Schematron? 1.1 enables users to express cross-field validations and constraints using XPath 2.0.
    • Extensibility and Versioning
      • Wildcard Enhancements
        • Weak wildcard support - XSD 1.0 uses Unique Particle Attribution rule (UPA) to avoid ambiguity associated primarily with the usage of wildcard content. UPA restricts users from defining certain complex content models. XSD 1.1 attempts to address this using Weak wildcards that disambiguate using precedence rules.
        • Negative Wildcards and Multiple Namespaces
        • Open Content
      • <xsd:all> Model Group Changes  - Finally!
      • Vendor Unique Extensions - Enables vendors to ship their own primitive datatypes and facets
      • Conditional Inclusion - Enable creation of extensible schemas.
    • Miscellaneous Enhancements
      • Conditional Type Assignments
      • Inherited Attributes
      • Default Attribute Group - Enable globally shared set of attribute definitions.
      • Substitution Group Enhancements - Multiple element substitution related changes.
      • ID related changes - Enables, amonst others, mutliple ID attributes for elements.
      • Redefine Replaced by Override - Richer overrides, more interoperability.
      • Target Namespace attribute on local elements and attribute declarations
    • New Datatypes and Facets
  • Iurie MAXIM

    In relation to ISSUE 1: BiosphereReserve

    Keep in mind that for the (ps:designation='BiosphereReserve'the online validator is expecting to to receive values with first capital letter, while (as in the Commission Regulation and Technical Guidelines for PS, while for ps:designation='protectedLandscapeOrSeascape' the online validator is expecting to receive values with first small caps letter, diferent than in the Commission Regulation and Technical Guidelines for PS, so there is an inconsistency. Either both BiosphereReserve and ProtectedLandscapeOrSeascape must be as in the Commission Regulation, either both must be with first letter small and the Commission Regulation, Tehnical Guideline and Registry must be updated accordingly.

  • Stefania MORRONE

    By Stefania MORRONE

    Just to give a complete picture about the inconsistencies, please consider the UML  data model for Protected Sites available online  in which the 'protectedLandscapeOrSeascape' (lower case) and the 'BiosphereReserve' (upper case) are specified.

    Taking into consideration the iconsistencies described in details in this discussion, the eENVplus Validator currently accepts both lower case (protectedLandscapeOrSeascape,biosphereReserve) and upper case (ProtectedLandscapeOrSeascape,BiosphereReserve) encoding for these two code list values.

  • Brian MACSHARRY

    By Brian MACSHARRY

    Thanks @Stefania and @Iurie for your comments on this.

    A few comments and observations.

    In the data specifications camel case is used for code lists so this explains why it is protected rather than Protected. I agree the online validation should check the correct codes.

    I personally do not like the names of the IUCN management categories as they can be potentially misleading and internationally the lI, Ia, II, III etc are used rather than the values used in the code - these are use din the European Protected Area data flow the CDDA and in the World Database on Protected Areas. lists.http://www.iucn.org/about/work/programmes/gpap_home/gpap_quality/gpap_pacategories/

    Use of 0,0 to report on sites with no geometry. I disagree with this and as Iurie mentioned this is a trick to solve the underlying issue of how to report Protected Sites which for one reason or another do not yet have a geometry. Thanks @Stefania for proposing a potential work around. There are protected areas that are sensitive that Member States are not willing to report on and they system needs to allow for this.

  • Stefania MORRONE

    By Stefania MORRONE

    A new version v2.0 of the eENVplus Validation Service is now available, which upgrades to release 1.22 of the OGC GML Test Suite and v4.4 of TEAM Engine test harness.

    Major new features:

    • Additional tests on the validity of geometry representations according to ISO19107 - referenced by GML Specification
    • Tests related to Coordinate Reference System (geometry lies within valid area of CRS).
    • Improved results overview
    • helpful tips on how to deal with major validation issues

    Access the service:

    Any feedback more than welcome!  

  • Lena Hallin-Pihlatie

    By Lena Hallin-Pihlatie

    @ Stefania. Thank you very much for providing information on the new eENVplus Validation Service. We find the improved version very useful.

    In our national dataset a Natura 2000 site may comprise of both polygons and lines. The Protected Sites Simple data model does not seem to put any rescrictions on the geomerty (GM_Object) of the spatial objects, implying that MultiGeometry is supported. However, if we include both polygons and lines (MultSurface & MultiCurve) for the same spatial object in our GML, it does not pass the validation in the eENVplus Validation Service. If we take the lines out, however, the GML passes the validation.

    Any comments on this from your side?

  • Stefania MORRONE

    By Stefania MORRONE

    Hi Lena,

    it's not a problem related to the use of MultiGeometry element because the eENVplus Validation Service does not execute any relevant test (to have a look at all tests that apply to GML geometry representations and envelopes click here). Would you provide a sample GML file (even just one/two failing features -with both polygons and lines in same geometry instance) so that I can be more precise?

    Thanks

    Stefania

     

This discussion is closed.

This discussion is closed and is not accepting new comments.

Biodiversity & Area Management

Biodiversity & Area Management

If themes like Protected Sites, Area Management/Restriction/Regulation Zones and Reporting Units, Habitats and Biotopes, Species Distribution, Bio-geographical Regions matters to you, join these groups!