Metadata Registries in NSW Govt

Hi champs,

I am looking at options for adopting a metadata registry for our team, with a view to perhaps sharing more widely within the cluster. We currently use a spreadsheet with a nifty Tableau front end, but I would like to improve interoperability and automated harvesting as simply as possible.

I’m aware that OEH’s Information Asset Register and data.nsw use CKAN, but I don’t know whether other departments are using any metadata registry tools, such as METeOR (AIHW), CKAN, Aristotle or Data61’s MAGDA (which powers data.gov.au).

Any tips appreciated.
With thanks, Drew

2 Likes

Hi Drew, outside of tools are you investigating common metadata standards for interoperability between agencies? For instance Dublin Core (https://www.dublincore.org/).

Hi James, I’m hoping that is handled by the tool, but if not then would definitely look at adding DC, AGLS etc.

Hi Drew

I’ll email you some further details. There is a group within Dept Communities and Justice that has recently implemented Aristotle for a specific metadata registry purpose (around KPI’s).

As to your question James, we in DCJ are looking at how we can reconvene a former FACS group called ISAG (Information Standards Advisory Group) that was looking to develop various information standards. Where possible that would utilise existing standards. That ISAG group had some participants from across Govt (years ago) but that fell away.

cheers
Mark

1 Like

Fabulous, thanks Mark! I’m keen to see how they fared in terms of usability & implementation.

Hi Drew.
At Sydney Water we use Collibra. Happy to give you at demo and some feedback if you are interested……

Thanks Juanita, I’ll drop you a line outside the forum.

Ta, Drew

Hi Mark,

Do you know if there is any inter-agency group for metadata standards? We also had a group here for standards that we are looking at re-convening, the Enterprise Information Working Group.

Hi Drew

In the projects I work on we implement linked data principles and tooling to create and manage the metadata and then provide an endpoint for a registry to consume to support sharing.

In short, have a look at the work being implemented at Geological Survey of Queensland (GSQ) . It’s an excellent and achievable model for both sides of the process. Moreover the whole approach has been written up for others to use https://github.com/geological-survey-of-queensland/vocabularies

A colleague formerly from Geoscience Australia was the architect of this approach in a project GSQ engaged Geoscience Australia to undertake. The team at GSQ are now up and running with the methodology under their own steam.

As a bit more background…

The metadata management side of things has good options for tools (free and paid), base-standards, domain standards customization and interchange mechanisms. The GSQ work is a good mix of tools working together that meet the constraints of small teams and early stage governance maturity agencies. The main challenges are twofold; 1. getting buy in and everyone on board to implement it and 2. a learning curve (by key people) and/or access to resources who can implement linked data, aka RDF, infrastructure. After these aspects are on a roadmap, modelling, taxonomies, vocabularies, domain specific standards and linking your data are, while not trivial, not that hard to get started to get some key easy wins, and the three things ( buy in, skills and depth of standards) mature well over time.

On the registry side of things, it is more of a mixed bag - a different set of challenges and things to weigh up. There are only a small number of free options and these require a degree of work to implement ( though this is not different from CKAN or MAGDA). They are certainly achievable for small teams that have commitment, but do require a good roadmap to the two challenges above.
Then there’s paid registry solutions which are quite expensive. However if the compliance requirements and risks are high, this is a price which at the end of the day, must be confronted to meet these requirements and we see many more organisations, particularly commonwealth agencies, agreeing that a linked data web-enabled approach is the way forward for this compliance.

The reality for most agencies at state level and even commonwealth is that the they are at the beginning of the journey and so the GSQ approach is a good model to kick off.

Just a quick additional note of detail…just to help place RDF and dublin core in context

At the lowest level linked data is based on the W3C standards OWL and RDF and is supported by a range of tools (free and commercial) also fully supporting the W3C standards.

Any domain can be modeled using OWL/RDF and by adopting the base standard any data can be used, linked and shared with any party also adopting the standard. Re-using and linking to other “standards” e.g. whole of government taxonomies in a non-vendor lock in way. This is the main strength of the approach.

@james.bibby mentioned dublin core. This is just one such standard using OWL/RDF as the underlying interoperability mechanism. But by itself dublin core only provides a very specific set of “attributes” (properties) and classes to define data - that is, it’s for a relatively narrow domain of applicability but reasonably extensive (deep) for that domain - it’s for libraries or managers of catalogues/lists/collections of item/s e.g. author, title, published date, plus other more abstract ones. This may be enough for some things… but it will quickly run out of expressiveness.

So for other domains and especially for describing very specific data objects and properties e.g. some aspect of water sampling data, this means it requires more work to extend dublin core, find other standards and customize them to define your domain specific properties.

Happy to chat more if you have any questions

Cheers

Simon

Hi @james.bibby and @mark.holdsworth

Check out the Australian linked data working group. http://www.linked.data.gov.au/

Even if the linked data approach isn’t adopted, there are a range of controlled whole of government and data vocabularies that are useful. Also there is much material from which to learn the principles, methods of standards development and re-use which have a proven track record. Though achieving a similar competency level of interoperability, re-use and provenance outside the linked data toolset is generally not possible.

Cheers

Simon

1 Like

Hi @Simon.Opper,

I would characterize the applicability of Dublin core the other way around. It is not at all deep, but it is broad - hence its usefulness for metadata.

We have good examples of this where we have successfully applied it across a diverse range of data sets to provide consistency to make is easy to automate data discovery and cataloging. Essentially we apply Dublin core to describe the data, and then domain specific data standards for the data itself.

See attached visualization.

Note while its shown here at record level, we apply it up to data set, repository and service level as well.

Hi James

I think this is a case of ‘it depends’ on the use case and requirements. To be clear I’m not saying it is not useful. My main point was that it is one of a number of standards which can be combined to make a very expressive set of linked models and queries. Starting using dublin core and SKOS are fundamental steps onto the data governance pathway, but being aware of their limitations and having a view to future needs is the main insight I’m keen to share with anyone on the journey.

Dublin core will allow for a level of specialisation/extension of it’s classes, suiting a wide range of use cases, that will meet a set competencies required. But using extensions of only dublin core classes and properties will necessarily limit the complexity(competency) of the queries you can then ask of your data. E.g. can i get provenance of what events have occurred on that piece of data over time and between applying standards or data quality fixes - no, can I understand the spatio-temporal relation of data at different scales ( data cubes, content negotiation and link-sets) - no, can I query what unit of measure is used and how this relates to another - no, is there a statement of qualtiy - no, as a user from view point 1 can I query what data is relevant to my data model ( same for user 2-n) - no.

Each of these facets has an extensive set of standards/vocabularies and ontologies which give much greater ability to develop metadata and to query it. e.g. Prov-O, QB4ST, SOSA, SHACL

If these more detailed data queries don’t need answering… happy days… a lean minimal approach is the ticket ( though you’d be surprised what value the others can bring easily). But for many projects of high importance they definitely are a requirement and this is where these bunch of cool standards can really be leveraged :muscle:

Cheers mate

1 Like
© Data.NSW