<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>metadata | Automated Data Observatories</title>
    <link>/tag/metadata/</link>
      <atom:link href="/tag/metadata/index.xml" rel="self" type="application/rss+xml" />
    <description>metadata</description>
    <generator>Wowchemy (https://wowchemy.com)</generator><language>en-us</language><copyright>© 2020-2021 Daniel Antal</copyright><lastBuildDate>Mon, 08 Nov 2021 10:00:00 +0100</lastBuildDate>
    <image>
      <url>/media/icon_hub7eb2fbae5fdd7bfeda5a9178a9e4f33_23448_512x512_fill_lanczos_center_2.png</url>
      <title>metadata</title>
      <link>/tag/metadata/</link>
    </image>
    
    <item>
      <title>How We Add Value to Public Data With Imputation and Forecasting</title>
      <link>/post/2021-11-06-indicator_value_added/</link>
      <pubDate>Mon, 08 Nov 2021 10:00:00 +0100</pubDate>
      <guid>/post/2021-11-06-indicator_value_added/</guid>
      <description>&lt;p&gt;Public data sources are often plagued by missng values. Naively you may think that you can ignore them, but think twice: in most cases, missing data in a table is not missing information, but rather malformatted information. This approach of ignoring or dropping missing values will not be feasible or robust when you want to make a beautiful visualization, or use data in a business forecasting model, a machine learning (AI) applicaton, or a more complex scientific model. All of the above require complete datasets, and naively discarding missing data points amounts to an excessive waste of information. In this example we are continuing the example a not-so-easy to find public dataset.&lt;/p&gt;
&lt;td style=&#34;text-align: center;&#34;&gt;













&lt;figure  id=&#34;figure-in-the-previous-blogpostpost2021-11-08-indicator_findable-we-explained-how-we-added-value-by-documenting-data-following-the-fair-principle-and-with-the-professional-curatorial-work-of-placing-the-data-in-context-and-linking-it-to-other-information-sources-such-as-other-datasets-books-and-publications-regardless-of-their-natural-language-ie-whether-these-sources-are-described-in-english-german-portugese-or-croatian-photo-jack-sloophttpsunsplashcomphotoseywn81spkj8&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/blogposts_2021/jack-sloop-eYwn81sPkJ8-unsplash.jpg&#34; alt=&#34;[In the previous blogpost](/post/2021-11-08-indicator_findable/) we explained how we added value by documenting data following the *FAIR* principle and with the professional curatorial work of placing the data in context, and linking it to other information sources, such as other datasets, books, and publications, regardless of their natural language (i.e., whether these sources are described in English, German, Portugese or Croatian). Photo: [Jack Sloop](https://unsplash.com/photos/eYwn81sPkJ8).&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      &lt;a href=&#34;/post/2021-11-08-indicator_findable/&#34;&gt;In the previous blogpost&lt;/a&gt; we explained how we added value by documenting data following the &lt;em&gt;FAIR&lt;/em&gt; principle and with the professional curatorial work of placing the data in context, and linking it to other information sources, such as other datasets, books, and publications, regardless of their natural language (i.e., whether these sources are described in English, German, Portugese or Croatian). Photo: &lt;a href=&#34;https://unsplash.com/photos/eYwn81sPkJ8&#34;&gt;Jack Sloop&lt;/a&gt;.
    &lt;/figcaption&gt;&lt;/figure&gt;&lt;/td&gt;
&lt;p&gt;Completing missing datapoints requires statistical production information (why might the data be missing?) and data science knowhow (how to impute the missing value.) If you do not have a good statistician or data scientist in your team, you will need high-quality, complete datasets. This is what our automated data observatories provide.&lt;/p&gt;
&lt;h2 id=&#34;why-is-data-missing&#34;&gt;Why is data missing?&lt;/h2&gt;
&lt;p&gt;International organizations offer many statistical products, but usually they are on an ‘as-is’ basis. For example, Eurostat is the world’s premiere statistical agency, but it has no right to overrule whatever data the member states of the European Union, and some other cooperating European countries give to them. And they cannot force these countries to hand over data if they fail to do so. As a result, there will be many data points that are missing, and often data points that have wrong (obsolete) descriptions or geographical dimensions. We will show the geographical aspect of the problem in a separate blogpost; for now, we only focus on missing data.&lt;/p&gt;
&lt;p&gt;Some countries have only recently started providing data to the Eurostat umbrella organization, and it is likely that you will find few datapoints for North Macedonia or Bosnia-Herzegovina. Other countries provide data with some delay, and the last one or two years are missing. And there are gaps in some countries’ data, too.&lt;/p&gt;
&lt;td style=&#34;text-align: center;&#34;&gt;













&lt;figure  id=&#34;figure-see-the-authoritative-copy-of-the-datasethttpszenodoorgrecord5652118yykhvmdmkuk&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/blogposts_2021/trb_plot.png&#34; alt=&#34;See the authoritative copy of the [dataset](https://zenodo.org/record/5652118#.YYkhVmDMKUk).&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      See the authoritative copy of the &lt;a href=&#34;https://zenodo.org/record/5652118#.YYkhVmDMKUk&#34;&gt;dataset&lt;/a&gt;.
    &lt;/figcaption&gt;&lt;/figure&gt;&lt;/td&gt;
&lt;p&gt;This is a headache if you want to use the data in some machine learning application or in a multiple or panel regression model. You can, of course, discard countries or years where you do not have full data coverage, but this approach usually wastes too much information&amp;ndash;if you work with 12 years, and only one data point is available, you would be discarding an entire country’s 11-years’ worth of data. Another option is to estimate the values, or otherwise impute the missing data, when this is possible with reasonable precision. This is where things get tricky, and you will likely need a statistician or a data scientist onboard.&lt;/p&gt;
&lt;h2 id=&#34;what-can-we-improve&#34;&gt;What can we improve?&lt;/h2&gt;
&lt;p&gt;Consider that the data is only missing from one year for a particular country, 2015. The naive solution would be to omit 2015 or the country at hand from the dataset. This is pretty destructive, because we know a lot about the radio market turnover in this country and in this year! But leaving 2015 blank will not look good on a chart, and will make your machine learning application or your regression model stop.&lt;/p&gt;
&lt;p&gt;A statistician or a radio market expert will tell you that you know more-or-less the missing information: the total turnover was certainly not zero in that year.  With some statistical or radio domain-specific knowledge you will use the 2014, or 2016 value, or a combination of the two and keep the country and year in the dataset.&lt;/p&gt;
&lt;p&gt;Our improved dataset added backcasted (using the best time series model fitting the country&amp;rsquo;s actually present data), forecasted (again, using the best time series model), and approximated data (using linear approximation.) In a few cases, we add the last or next known value.  To give a few quantiative indicators about our work:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Increased number of observations: 65%&lt;/li&gt;
&lt;li&gt;Reduced missing values: -48.1%&lt;/li&gt;
&lt;li&gt;Increased non-missing subset for regression or AI: +66.67%&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If your organization is working with panel (longitudional multiple) regressions or various machine learning applications, then your team knows that not havint the +66.67% gain would be a deal-breaker in the choice of models and punctuality of estimates or KPIs or other quantiative products. And that they would spent about 90% of their data resources on achieving this +66.67% gain in usability.&lt;/p&gt;
&lt;p&gt;If you happen to work in an NGO, a business unit or a research institute that does not employ data scientists, then it is likely that you can never achieve this improvement, and you have to give up on a number of quantitative tools or visualizations. If you  have a data scientist onboard, that professional can use our work as a starting point.&lt;/p&gt;
&lt;h2 id=&#34;can-you-trust-our-data&#34;&gt;Can you trust our data?&lt;/h2&gt;
&lt;p&gt;We believe that you can trust our data better than the original public source. We use statistical expertise to find out why data may be missing. Often, it is present in a wrong location (for example, the name of a region changed.)&lt;/p&gt;
&lt;p&gt;If you are reluctant to use estimates, think about discarding known actual data from your forecast or visualization, because one data point is missing.  How do you provide more accurate information? By hiding known actual data, because one point is missing, or by using all known data and an estimate?&lt;/p&gt;
&lt;p&gt;Our codebooks and our API uses the &lt;a href=&#34;https://sdmx.org/?page_id=3215/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Statistical Data and Metadata eXchange&lt;/a&gt; documentation standards to clearly indicate which data is observed, which is missing, which is estimated, and of course, also how it is estimated.
This example highlights another important aspect of data trustworthiness. If you have a better idea, you can replace them with a better estimate.&lt;/p&gt;
&lt;p&gt;Our indicators come with standardized codebooks that do not only contain the descriptive metadata, but administrative metadata about the history of the indicator values. You will find very important information about the statistical method we used the fill in the data gaps, and even link the reliable, the peer-reviewed scientific, statistical software that made the calculations. For data scientists, we record the plenty of information about the computing environment, too-–this can come handy if your estimates need external authentication, or you suspect a bug.&lt;/p&gt;
&lt;h2 id=&#34;avoid-the-data-sisyphus&#34;&gt;Avoid the data Sisyphus&lt;/h2&gt;
&lt;p&gt;If you work in an academic institution, in an NGO or a consultancy, you can never be sure who downloaded the &lt;a href=&#34;https://appsso.eurostat.ec.europa.eu/nui/show.do?dataset=sbs_na_1a_se_r2&amp;amp;lang=en&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Annual detailed enterprise statistics for services (NACE Rev. 2 H-N and S95)&lt;/a&gt; Eurostat folder from Eurostat. Did they modify the dataset? Did they already make corrections with the missing data? What method did they use? To prevent many potential problems, you will likely download it again, and again, and again&amp;hellip;&lt;/p&gt;
&lt;td style=&#34;text-align: center;&#34;&gt;













&lt;figure  id=&#34;figure-see-our-the-data-sisyphushttpsreprexnlpost2021-07-08-data-sisyphus-blogpost&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/blogposts_2021/Sisyphus_Bodleian_Library.png&#34; alt=&#34;See our [The Data Sisyphus](https://reprex.nl/post/2021-07-08-data-sisyphus/) blogpost.&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      See our &lt;a href=&#34;https://reprex.nl/post/2021-07-08-data-sisyphus/&#34;&gt;The Data Sisyphus&lt;/a&gt; blogpost.
    &lt;/figcaption&gt;&lt;/figure&gt;&lt;/td&gt;
&lt;p&gt;We have a better solution. You can always rely on our API to import directly the latest, best data, but if you want to be sure, you can use our &lt;a href=&#34;https://zenodo.org/record/5652118#.YYhGOGDMLIU&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;regular backups&lt;/a&gt; on Zenodo. Zenodo is an open science repository managed by CERN and supported by the European Union. On Zenodo, you can find an authoritative copy of our indicator (and its previous versions) with a digital object identifier, in this case, &lt;a href=&#34;https://doi.org/10.5281/zenodo.5652118&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;10.5281/zenodo.5652118&lt;/a&gt;. These datasets will be preserved for decades, and nobody can manipulate them. You cannot accidentally overwrite them, and we have no backdoor access to modify them.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://doi.org/10.5281/zenodo.5652118&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;&lt;img src=&#34;https://zenodo.org/badge/DOI/10.5281/zenodo.5652118.svg&#34; alt=&#34;DOI&#34;&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Are you a data user? Give us some feedback! Shall we do some further automatic data enhancements with our datasets? Document with different metadata? Link more information for business, policy, or academic use? Please  give us any &lt;a href=&#34;https://reprex.nl/#contact&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;feedback&lt;/a&gt;!&lt;/em&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>How We Add Value to Public Data With Better Curation And Documentation?</title>
      <link>/post/2021-11-08-indicator_findable/</link>
      <pubDate>Mon, 08 Nov 2021 09:00:00 +0100</pubDate>
      <guid>/post/2021-11-08-indicator_findable/</guid>
      <description>&lt;p&gt;In this example, we show a simple indicator: the &lt;em&gt;Turnover in Radio Broadcasting Enterprises&lt;/em&gt; in many European countries. This is an important demand driver in the &lt;a href=&#34;https://music.dataobservatory.eu/#pillars&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Music economy pillar&lt;/a&gt; of our Digital Music Observatory, and important indicator in our more general &lt;a href=&#34;https://ccsi.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Cultural &amp;amp; Creative Sectors and Industries Observatory&lt;/a&gt;. We show a very similar example in our &lt;em&gt;Green Deal Data Observatory&lt;/em&gt; with &lt;a href=&#34;https://greendeal.dataobservatory.eu/post/2021-11-08-indicator_findable/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;environmental R&amp;amp;D public spending in Europe&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This dataset comes from a public datasource, the data warehouse of the
European statistical agency, Eurostat. Yet it is not trivial to use:
unless you are familiar with national accounts, you will not find &lt;a href=&#34;https://appsso.eurostat.ec.europa.eu/nui/show.do?dataset=sbs_na_1a_se_r2&amp;amp;lang=en&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;this dataset&lt;/a&gt; on the Eurostat website.&lt;/p&gt;
&lt;td style=&#34;text-align: center;&#34;&gt;













&lt;figure  id=&#34;figure-the-data-can-be-retrieved-from-the-annual-detailed-enterprise-statistics-for-services-nace-rev2-h-n-and-s95-eurostat-folder&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/blogposts_2021/eurostat_radio_broadcasting_turnover.png&#34; alt=&#34;The data can be retrieved from the Annual detailed enterprise statistics for services NACE Rev.2 H-N and S95 Eurostat folder.&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      The data can be retrieved from the Annual detailed enterprise statistics for services NACE Rev.2 H-N and S95 Eurostat folder.
    &lt;/figcaption&gt;&lt;/figure&gt;&lt;/td&gt;
&lt;p&gt;Our version of this statistical indicator is documented following the &lt;a href=&#34;https://www.go-fair.org/fair-principles/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;FAIR principles&lt;/a&gt;: our data assets
are findable, accessible, interoperable, and reusable. While the
Eurostat data warehouse partly fulfills these important data quality
expectations, we can improve them significantly. And we can also
improve the dataset, too, as we will show in the &lt;a href=&#34;/post/2021-11-06-indicator_value_added/&#34;&gt;next blogpost&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;findable-data&#34;&gt;Findable Data&lt;/h2&gt;
&lt;p&gt;Our data observatories add value by curating the data&amp;ndash;we bring this
indicator to light with a more descriptive name, and we place it in
context with our &lt;a href=&#34;https://music.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Digital Music Observatory&lt;/a&gt; and &lt;a href=&#34;https://ccsi.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Cultural &amp;amp; Creative Sectors and Industries Observatory&lt;/a&gt;.
While many people may need this dataset in the creative sectors, or
among cultural policy designers, most of them have no training in working with
national accounts, which imply decyphering national account data codes in records that measure economic activity at a national level. Our curated data observatories bring together many available data around important domains. Our &lt;em&gt;Digital Music Observatory&lt;/em&gt;, for example, aims to form an ecosystem of music data users and producers.&lt;/p&gt;
&lt;td style=&#34;text-align: center;&#34;&gt;













&lt;figure  id=&#34;figure-we-added-descriptive-metadatahttpszenodoorgrecord5652113yykvbwdmkuk-that-help-you-find-our-data-and-match-it-with-other-relevant-data-sources&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/blogposts_2021/zenodo_metadata_eurostat_radio_broadcasting_turnover.png&#34; alt=&#34;We [added descriptive metadata](https://zenodo.org/record/5652113#.YYkVBWDMKUk) that help you find our data and match it with other relevant data sources.&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      We &lt;a href=&#34;https://zenodo.org/record/5652113#.YYkVBWDMKUk&#34;&gt;added descriptive metadata&lt;/a&gt; that help you find our data and match it with other relevant data sources.
    &lt;/figcaption&gt;&lt;/figure&gt;&lt;/td&gt;
&lt;p&gt;We added descriptive metadata that help you find our data and match it
with other relevant data sources. For example, we add keywords and
standardized metadata identifiers from the Library of Congress Linked
Data Services, probably the world’s largest standardized knowledge
library description. This ensures that you can find relevant data
around the same key term (&lt;a href=&#34;https://id.loc.gov/authorities/subjects/sh85110448.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;radio broadcasting&lt;/a&gt;)
in addition to our turnover data. This allows connecting our dataset unambiguosly
with other information sources that use the same concept, but may be listed under
different keywords, such as &lt;em&gt;Radio–Broadcasting&lt;/em&gt;, or &lt;em&gt;Radio industry and
trade&lt;/em&gt;, or maybe &lt;em&gt;Hörfunkveranstalter&lt;/em&gt; in German, or &lt;em&gt;Emitiranje
radijskog programa&lt;/em&gt; in Croatian or &lt;em&gt;Actividades de radiodifusão&lt;/em&gt; in
Portugese.&lt;/p&gt;
&lt;h2 id=&#34;accessible-data&#34;&gt;Accessible Data&lt;/h2&gt;
&lt;p&gt;Our data is accessible in two forms: in csv tabular format (which can be
read with Excel, OpenOffice, Numbers, SPSS and many similar spreadsheet
or statistical applications) and in JSON for automated importing into
your databases. We can also provide our users with SQLite databases,
which are fully functional, single user relational databases.&lt;/p&gt;
&lt;p&gt;Tidy datasets are easy to manipulate, model and visualize, and have a
specific structure: each variable is a column, each observation is a
row, and each type of observational unit is a table. This makes the data
easier to clean, and far more easier to use in a much wider range of
applications than the original data we used. In theory, this is a simple objective,
yet we find that even governmental statistical agencies&amp;ndash;and even scientific
publications&amp;ndash;often publish untidy data. This poses a significant problem that implies
productivity loses: tidying data will require long hours of investment, and if
a reproducible workflow is not used, data integrity can also be compromised:
chances are that the process of tidying will overwrite, delete, or omit a data or a label.&lt;/p&gt;
&lt;td style=&#34;text-align: center;&#34;&gt;













&lt;figure  id=&#34;figure-tidy-datasetshttpsr4dshadconztidy-datahtml-are-easy-to-manipulate-model-and-visualize-and-have-a-specific-structure-each-variable-is-a-column-each-observation-is-a-row-and-each-type-of-observational-unit-is-a-table&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/blogposts_2021/tidy-8.png&#34; alt=&#34;[Tidy datasets](https://r4ds.had.co.nz/tidy-data.html) are easy to manipulate, model and visualize, and have a specific structure: each variable is a column, each observation is a row, and each type of observational unit is a table.&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      &lt;a href=&#34;https://r4ds.had.co.nz/tidy-data.html&#34;&gt;Tidy datasets&lt;/a&gt; are easy to manipulate, model and visualize, and have a specific structure: each variable is a column, each observation is a row, and each type of observational unit is a table.
    &lt;/figcaption&gt;&lt;/figure&gt;&lt;/td&gt;
&lt;p&gt;While the original data source, the Eurostat data warehouse is
accessible, too, we added value with bringing the data into a &lt;a href=&#34;https://www.jstatsoft.org/article/view/v059i10&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;tidy
format&lt;/a&gt;. Tidy data can
immediately be imported into a statistical application like SPSS or
STATA, or into your own database. It is immediately available for
plotting in Excel, OpenOffice or Numbers.&lt;/p&gt;
&lt;h2 id=&#34;interoperability&#34;&gt;Interoperability&lt;/h2&gt;
&lt;p&gt;Our data can be easily imported with, or joined with data from other internal or external sources.&lt;/p&gt;
&lt;td style=&#34;text-align: center;&#34;&gt;













&lt;figure  id=&#34;figure-all-our-indicators-come-with-standardized-descriptive-metadata-and-statistical-processing-metadata-see-our-apihttpsapimusicdataobservatoryeudatabasemetadata&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/observatory_screenshots/DMO_API_metadata_table.png&#34; alt=&#34;All our indicators come with standardized descriptive metadata, and statistical (processing) metadata. See our [API](https://api.music.dataobservatory.eu/database/metadata/) &#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      All our indicators come with standardized descriptive metadata, and statistical (processing) metadata. See our &lt;a href=&#34;https://api.music.dataobservatory.eu/database/metadata/&#34;&gt;API&lt;/a&gt;
    &lt;/figcaption&gt;&lt;/figure&gt;&lt;/td&gt;
&lt;p&gt;All our indicators come with standardized descriptive metadata,
following two important standards, the &lt;a href=&#34;https://dublincore.org/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Dublin Core&lt;/a&gt; and
&lt;a href=&#34;https://datacite.org/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;DataCite&lt;/a&gt;–implementing not only the mandatory,
but the recommended descriptions, too. This will make it far easier to
connect the data with other data sources, e.g. turnover with the number of radio broadcasting enterprises or
radio stations within specific territories.&lt;/p&gt;
&lt;p&gt;Our passion for documentation standards and best practices goes much further: our data uses &lt;a href=&#34;https://sdmx.org/?page_id=3215/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Statistical Data and Metadata eXchange&lt;/a&gt; standardized codebooks, unit descriptions and other statistical and administrative metadata.&lt;/p&gt;
&lt;td style=&#34;text-align: center;&#34;&gt;













&lt;figure  id=&#34;figure-we-participate-in-scientific-workhttpsreprexnlpublicationeuropean_visibilitiy_2021-related-to-data-interoperability&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/reports/european_visbility_publication.png&#34; alt=&#34;We participate in [scientific work](https://reprex.nl/publication/european_visibilitiy_2021/) related to data interoperability.&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      We participate in &lt;a href=&#34;https://reprex.nl/publication/european_visibilitiy_2021/&#34;&gt;scientific work&lt;/a&gt; related to data interoperability.
    &lt;/figcaption&gt;&lt;/figure&gt;&lt;/td&gt;
&lt;h2 id=&#34;reuse&#34;&gt;Reuse&lt;/h2&gt;
&lt;p&gt;All our datasets come with standardized information about reusabililty.
We add citation, attribution data, and licensing terms. Most of our
datasets can be used without commercial restriction after acknowledging
the source, but we sometimes work with less permissible data licenses.&lt;/p&gt;
&lt;p&gt;In the case presented here, we added further value to encourage re-use. In addition to tidying, we
significantly increased the usability of public data by handling
missing cases. This is the subject of our next blogpost.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Are you a data user? Give us some feedback! Shall we do some further
automatic data enhancements with our datasets? Document with different
metadata? Link more information for business, policy, or academic use? Please
give us any &lt;a href=&#34;https://reprex.nl/#contact&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;feedback&lt;/a&gt;!&lt;/em&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>The Data Sisyphus</title>
      <link>/post/2021-07-08-data-sisyphus/</link>
      <pubDate>Thu, 08 Jul 2021 09:00:00 +0200</pubDate>
      <guid>/post/2021-07-08-data-sisyphus/</guid>
      <description>&lt;td style=&#34;text-align: center;&#34;&gt;













&lt;figure  id=&#34;figure-sisyphus-was-punished-by-being-forced-to-roll-an-immense-boulder-up-a-hill-only-for-it-to-roll-down-every-time-it-neared-the-top-repeating-this-action-for-eternity--this-is-the-price-that-project-managers-and-analysts-pay-for-the-inadequate-documentation-of-their-data-assets&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/blogposts_2021/Sisyphus_Bodleian_Library.png&#34; alt=&#34;Sisyphus was punished by being forced to roll an immense boulder up a hill only for it to roll down every time it neared the top, repeating this action for eternity.  This is the price that project managers and analysts pay for the inadequate documentation of their data assets.&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      Sisyphus was punished by being forced to roll an immense boulder up a hill only for it to roll down every time it neared the top, repeating this action for eternity.  This is the price that project managers and analysts pay for the inadequate documentation of their data assets.
    &lt;/figcaption&gt;&lt;/figure&gt;&lt;/td&gt;
&lt;p&gt;&lt;em&gt;When was a file downloaded from the internet?  What happened with it sense?  Are their updates? Did the bibliographical reference was made for quotations?  Missing values imputed?  Currency translated? Who knows about it – who created a dataset, who contributed to it?  Which is an intermediate format of a spreadsheet file, and which is the final, checked, approved by a senior manager?&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Big data creates inequality and injustice. On aspect of this inequality is the cost of data processing and documentation – a greatly underestimated, and usually not reported cost item. In small organizations, where there are no separate data science and data engineering roles, data is usually supposed to be processed and documented by (junior) analysts or researchers.  This a very important source of the gap between Big Tech and them: the data usually ends up very expensive, ill-formatted, not readable by computers that use machine learning and AI. Usually the documentation steps are completely omitted.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;“Data is potential information, analogous to potential energy: work is required to release it.” &amp;ndash; Jeffrey Pomerantz&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Metadata, which is information about the history of the data, and information how it can be technically and legally reused, has a hidden cost. Cheap or low-quality external data comes with poor or no metadata, and small organizations lack the resources to add high-quality metadata to their datasets. However, this only perpetuates the problem.&lt;/p&gt;
&lt;h2 id=&#34;metadata-unbillable-hours&#34;&gt;The hidden cost item behind the unbillable hours&lt;/h2&gt;
&lt;p&gt;As we have shown with our research partners, such metadata problems are not unique to data analysis.  Independent artists and small labels are suffering on music or book sales platforms, because their copyrighted content is not well documented.  If you automatically document tens of thousands of songs or datasets, the documentation cost is very small per item. If you, do it manually, the cost may be higher than the expected revenue from the song, or the total cost of the dataset itself. (See our research consortiums&#39; preprint paper: &lt;a href=&#34;https://dataandlyrics.com/publication/european_visibilitiy_2021/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Ensuring the Visibility and Accessibility of European Creative Content on the World Market: The Need for Copyright Data Improvement in the Light of New Technologies&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;In the short run, small consultancies, NGOs, or as a matter of fact, musicians, seem to logically give up on high-quality documentation and logging.  In the long run, this has two devastating consequences: computers, such as machine learning algorithms cannot read their documents, data, songs.  And as memory fades, the ill-documented resources need to be re-created, re-checked, reformatted.  Often, they are even hard to find on your internal server or laptop archive.&lt;/p&gt;
&lt;p&gt;Metadata is a hidden destroyer of the competitiveness of corporate or academic research, or independent content management.   It never quoted on external data vendor invoices, it is not planned as a cost item, because metadata, the description of a dataset, a document, a presentation, or song, is meaningless without the resource that it describes. You never buy metadata.  But if your dataset comes without proper metadata documentation, you are bound, like Sisyphus, to search for it, to re-arrange it, to check its currency units, its digits, its formatting.  Data analysts are reported to spend about 80% of their working hours on data processing and not data analysis &amp;ndash; partly, because data processing is a very laborious task that can be done by computers at a scale far cheaper, and partly because they do not know if the person who sat before them at the same desk has already performed these tasks, or if the person responsible for quality control checked for errors.&lt;/p&gt;
&lt;td style=&#34;text-align: center;&#34;&gt;













&lt;figure  id=&#34;figure-uncut-diamonds-need-to-be-cut-polished-and-you-have-to-make-sure-that-they-come-from-a-legal-source-data-is-similar-it-needs-to-be-tidied-up-checked-and-documented-before-use-photo-dave-fischer&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/gems/Uncut-diamond_Edit.jpg&#34; alt=&#34;Uncut diamonds need to be cut, polished, and you have to make sure that they come from a legal source. Data is similar: it needs to be tidied up, checked and documented before use. Photo: Dave Fischer.&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      Uncut diamonds need to be cut, polished, and you have to make sure that they come from a legal source. Data is similar: it needs to be tidied up, checked and documented before use. Photo: Dave Fischer.
    &lt;/figcaption&gt;&lt;/figure&gt;&lt;/td&gt;
&lt;p&gt;Undocumented data is hardly informative – it may be a page in a book, a file in an obsolete file format on a governmental server, an Excel sheet that you do not remember to have checked for updates.  Most data are useless, because we do not know how it can inform us, or we do not know if we can trust it.  The processing can be a daunting task, not to mention the most boring and often neglected documentation duties after the dataset is final and pronounced error-free by the person in charge of quality control.&lt;/p&gt;
&lt;h2 id=&#34;observatory-metadata-services&#34;&gt;Our observatory automatically processes and documents the data&lt;/h2&gt;
&lt;p&gt;The good news about documentation and data validation costs is that they can be shared.  If many users need GDP/capita data from all over the world in euros, then it is enough if only one entity, a data observatory, collects all GDP and population data expresed in dollars, korunas, and euros, and makes sure that the latest data is correctly translated to euros, and then correctly divided by the latest population figures. These task are error-prone,and should not be repeaeted by every data journalist, NGO employee, PhD student or junior analyst.  This is one of the services of our data observatory.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;input checked=&#34;&#34; disabled=&#34;&#34; type=&#34;checkbox&#34;&gt; The tidy data format means that the data has a uniform and clear data structure and semantics, therefore it can be automatically validated for many common errors and can be automatically documented by either our software or any other professional data science application. It is not as strict as the schema for a relational database, but it is strict enough to make, among other things, importing into a database easy.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;input checked=&#34;&#34; disabled=&#34;&#34; type=&#34;checkbox&#34;&gt; The descriptive metadata contains information on how to find the data, access the data, join it with other data (interoperability) and use it, and reuse it, even years from now. Among others, it contains file format information and intellectual property rights information.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;input checked=&#34;&#34; disabled=&#34;&#34; type=&#34;checkbox&#34;&gt; The processing metadata makes the data usable in strictly regulated professional environments, such as in public administration, law firms, investment consultancies, or in scientific research. We give you the entire processing history of the data, which makes peer-review or external audit much easier and cheaper.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;input checked=&#34;&#34; disabled=&#34;&#34; type=&#34;checkbox&#34;&gt; The authoritative copy is held at an independent repository, it has a globally unique identifier that protects you from accidental data loss, mixing up with unfinished an untested version.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;td style=&#34;text-align: center;&#34;&gt;













&lt;figure  id=&#34;figure-cutting-the-dataset-to-a-format-with-clear-semantics-and-documenting-it-with-the-fair-metadata-concep-exponentially-increases-the-value-of-data-it-can-be-publisehd-or-sold-at-a-premium-photo-andere-andrehttpscommonswikimediaorgwindexphpcurid4770037&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/gems/Diamond_Polisher.jpg&#34; alt=&#34;Cutting the dataset to a format with clear semantics and documenting it with the FAIR metadata concep exponentially increases the value of data. It can be publisehd or sold at a premium. Photo: [Andere Andre](https://commons.wikimedia.org/w/index.php?curid=4770037).&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      Cutting the dataset to a format with clear semantics and documenting it with the FAIR metadata concep exponentially increases the value of data. It can be publisehd or sold at a premium. Photo: &lt;a href=&#34;https://commons.wikimedia.org/w/index.php?curid=4770037&#34;&gt;Andere Andre&lt;/a&gt;.
    &lt;/figcaption&gt;&lt;/figure&gt;&lt;/td&gt;
&lt;p&gt;While humans are much better at analysing the information and human agency is required for trustworthy AI, computers are much better at processing and documenting data.  We apply to important concepts to our data service: we always process the data to the tidy format, we create an authoritative copy, and we always automatically add descriptive and processing metadata.&lt;/p&gt;
&lt;h2 id=&#34;value-of-metadata&#34;&gt;The value of metadata&lt;/h2&gt;
&lt;p&gt;Metadata is often more valuable and more costly to make than the data itself, yet it remains an elusive concept for senior or financial management.  Metadata is information about how to correctly use the data and has no value without the data itself.  Data acquisition, such as buying from a data vendor, or paying an opinion polling company, or external data consultants appears among the material costs, but metadata is never sold alone, and you do not see its cost.&lt;/p&gt;
&lt;p&gt;In most cases, the reason why &lt;a href=&#34;https://dataandlyrics.com/post/2021-06-18-gold-without-rush/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;there is no gold rush for open data&lt;/a&gt; is that fact that while the EU member states release billions of euros&#39; worth data for free, or at very low cost, annually, it comes without proper metadata.&lt;/p&gt;
&lt;td style=&#34;text-align: center;&#34;&gt;













&lt;figure  id=&#34;figure-data-as-serviceservicesdata-as-servicereusable-legal-easy-to-import-interoperable-always-fresh-data-in-tidy-formats-with-a-modern-api-photo-edgar-sotohttpsunsplashcomphotosgb0bzgae1nk&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/gems/edgar-soto-gb0BZGae1Nk-unsplash.jpg&#34; alt=&#34;[Data-as-Service](/services/data-as-service/)Reusable, legal, easy-to-import, interoperable, always fresh data in tidy formats with a modern API. Photo: [Edgar Soto](https://unsplash.com/photos/gb0BZGae1Nk).&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      &lt;a href=&#34;/services/data-as-service/&#34;&gt;Data-as-Service&lt;/a&gt;&lt;/br&gt;&lt;/br&gt;Reusable, legal, easy-to-import, interoperable, always fresh data in tidy formats with a modern API. Photo: &lt;a href=&#34;https://unsplash.com/photos/gb0BZGae1Nk&#34;&gt;Edgar Soto&lt;/a&gt;.
    &lt;/figcaption&gt;&lt;/figure&gt;&lt;/td&gt;
&lt;p&gt;If the data source is cheap or has a low quality, you do not even get it.  If you do not have it, it will show up as a human resource cost in research (when your analysist or junior researcher are spending countless hours to find out the missing metadata information on the correct use of the data) or in sales costs (when you try to reuse a research, consulting or legal product and you have comb through your archive and retest elements again and again.)&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input checked=&#34;&#34; disabled=&#34;&#34; type=&#34;checkbox&#34;&gt; The data, together with the descriptive and administrative metadata, and links to the use license and the authoritative copy can be found in our API. Try it out!&lt;/li&gt;
&lt;/ul&gt;
</description>
    </item>
    
    <item>
      <title>Metadata</title>
      <link>/services/metadata/</link>
      <pubDate>Wed, 07 Jul 2021 00:00:00 +0000</pubDate>
      <guid>/services/metadata/</guid>
      <description>&lt;p&gt;&lt;em&gt;Adding metadata exponentially increases the value of data. Did your region add a new town to its boundaries? How do you adjust old data to conform to constantly changing geographic boundaries? What are some practical ways of combining satellite sensory data with my organization&amp;rsquo;s records? And do I have the right to do so? Metadata logs the history of data, providing instructions on how to reuse it, also setting the terms of use. We automate this labor-intensive process applying the FAIR data concept.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;In our observatory we apply the concept of &lt;a href=&#34;#FAIR&#34;&gt;FAIR&lt;/a&gt; (&lt;strong&gt;f&lt;/strong&gt;indable, &lt;strong&gt;a&lt;/strong&gt;ccessibe, &lt;strong&gt;i&lt;/strong&gt;nteroperable, and &lt;strong&gt;r&lt;/strong&gt;eusable digital assets) in our APIs and in our open-source statistical software packages.&lt;/p&gt;
&lt;h2 id=&#34;the-hidden-cost-item&#34;&gt;The hidden cost item&lt;/h2&gt;
&lt;p&gt;Metadata gets less attention than data, because it is never acquired separately, it is not on the invoice, and therefore it remains an a hidden cost, and it is more important from a budgeting and a usability point of view than the data itself. Metadata is responsible for industry non-billable hours or uncredited working hours in academia. Poor data documentation, lack of reproducible processing and testing logs, inconsistent use of currencies, keywords, and storing &lt;a href=&#34;#messy-data&#34;&gt;messy data&lt;/a&gt; make reusability and interoperability, integration with other information impossible.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;#FAIR-data&#34;&gt;FAIR Data and the Added Value of Rich Metadata&lt;/a&gt; we introduce how we apply the concept of &lt;a href=&#34;#FAIR&#34;&gt;FAIR&lt;/a&gt; (&lt;strong&gt;f&lt;/strong&gt;indable, &lt;strong&gt;a&lt;/strong&gt;ccessibe, &lt;strong&gt;i&lt;/strong&gt;nteroperable, and &lt;strong&gt;r&lt;/strong&gt;eusable digital assets) in our APIs.&lt;/p&gt;
&lt;p&gt;Organizations pay many times for the same, repeated work, because these boring tasks, which often comprise of tens of thousands of microtasks, are neglected. Our solution creates automatic documentation and metadata for your own historical internal data or for acquisitions from data vendors. We apply the more general &lt;a href=&#34;#Dublin-Core&#34;&gt;Dublin Core&lt;/a&gt; and the more specific, mandatory and recommended values of &lt;a href=&#34;#DataCite&#34;&gt;DataCite&lt;/a&gt; for datasets &amp;ndash; these are new requirements in EU-funded research from 2021. But they are just the minimal steps, and there is a lot more to do to create a diamond ring from an uncut gem.&lt;/p&gt;
&lt;h2 id=&#34;map-your-data-bibliographis-catalogues-codebooks-versioning&#34;&gt;Map your data: bibliographis, catalogues, codebooks, versioning&lt;/h2&gt;
&lt;p&gt;Updating descriptive metadata, such as bibliographic citation files, descriptions and sources to data files downloaded from the internet, versioning spreadsheet documents and presentations is usually a hated and often neglected task withing organization, and rightly so: these boring and error-prone tasks are best left to computers.&lt;/p&gt;














&lt;figure  id=&#34;figure-already-adjusted-spreadsheets-are-re-adjusted-and-re-checked-hours-are-spent-on-looking-for-the-right-document-with-the-rigth-version-duplicates-multiply-already-downloaded-data-is-downloaded-again-and-miscategorized-again-finding-the-data-without-map-is-a-treasure-hunt-photo--nhttpsunsplashcomphotosrfid0_7kep4utm_sourceunsplash&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/gems/n-RFId0_7kep4-unsplash.jpg&#34; alt=&#34;Already adjusted spreadsheets are re-adjusted and re-checked. Hours are spent on looking for the right document with the rigth version. Duplicates multiply. Already downloaded data is downloaded again, and miscategorized, again. Finding the data without map is a treasure hunt. Photo: © [N.](https://unsplash.com/photos/RFId0_7kep4?utm_source=unsplash)&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      Already adjusted spreadsheets are re-adjusted and re-checked. Hours are spent on looking for the right document with the rigth version. Duplicates multiply. Already downloaded data is downloaded again, and miscategorized, again. Finding the data without map is a treasure hunt. Photo: © &lt;a href=&#34;https://unsplash.com/photos/RFId0_7kep4?utm_source=unsplash&#34;&gt;N.&lt;/a&gt;
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;The lack of time and resources spend on documentation over time reduces reusability and significantly increases data processing and supervision or auditing costs.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input checked=&#34;&#34; disabled=&#34;&#34; type=&#34;checkbox&#34;&gt; Our observatory metadata is compliant with the &lt;a href=&#34;https://www.dublincore.org/specifications/dublin-core/cross-domain-attribute/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Dublin Core Cross-Domain Attribute Set&lt;/a&gt; metadata standard, but we use different formatting. We offer simple re-formatting from the richer DataCite to Dublin Core for interoperability with a wider set of data sources.&lt;/li&gt;
&lt;li&gt;&lt;input checked=&#34;&#34; disabled=&#34;&#34; type=&#34;checkbox&#34;&gt; We use all &lt;a href=&#34;https://support.datacite.org/docs/datacite-metadata-schema-v44-mandatory-properties&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;mandatory&lt;/a&gt; DataCite metadata fields, all the &lt;a href=&#34;https://support.datacite.org/docs/datacite-metadata-schema-v44-recommended-and-optional-properties&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;the recommended and optional&lt;/a&gt; ones.&lt;/li&gt;
&lt;li&gt;&lt;input checked=&#34;&#34; disabled=&#34;&#34; type=&#34;checkbox&#34;&gt; It complies with the tidy data principles.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In other words: very easy to import into your databases, or join with other databases, and the information is easy to find.  Corrections, updates can automatically managed.&lt;/p&gt;
&lt;h2 id=&#34;what-happened-with-the-data-before&#34;&gt;What happened with the data before?&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input checked=&#34;&#34; disabled=&#34;&#34; type=&#34;checkbox&#34;&gt; We are creating Codebooks that are following the SDMX statistical metadata codelists, and resemble the SMDX concepts used by international statistical agencies. (See more technical information &lt;a href=&#34;https://r.dataobservatory.eu/articles/codebook.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;here&lt;/a&gt;.)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Small organizations often cannot afford to have data engineers and data scientists on staff, and they employ analysts who work with Excel, OpenOffice, PowerBI, SPSS or Stata.  The problem with these applications is that they often require the user to manually adjust the data, with keyboard entries or mouse clicks.  Furthermore, they do not provide a precise logging of the data processing, manipulation history.
The manual data processing and manipulation is very error prone and makes the use of complex and high value resources, such as harmonized surveys or symmetric input-output tables, to name two important source we deal with, impossible to use.  The use of these high-value data sources often requires tens of thousands of data processing steps: no human can do it faultlessly.&lt;/p&gt;
&lt;p&gt;What is even more problematic that simple applications for analysis do not provide a log of these manipulations’ steps: pulling over a column with the mouse, renaming a row, adding a zero to an empty cell. This makes senior supervisory oversight and external audit very costly.&lt;/p&gt;
&lt;p&gt;Our data comes with full history: all changes are visible, and we even open the code or algorithm that processed the raw data.  Your analysts can still use their favourite spreadsheet or statistical software application, but they can start from a clean, tidy dataset, with all data wrangling, currency and unit conversion, imputation and other low-priority but important tasks done and logged.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Metadata</title>
      <link>/data/metadata/</link>
      <pubDate>Tue, 01 Jun 2021 11:00:00 +0000</pubDate>
      <guid>/data/metadata/</guid>
      <description>&lt;p&gt;Our observatory has a new data API which allows access to our daily refreshing open data. You can access the API via &lt;a href=&#34;http://api.greendeal.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;api.greendeal.dataobservatory.eu&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;All the data and the metadata are available as open data, without database use restrictions, under the &lt;a href=&#34;https://opendatacommons.org/licenses/odbl/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;ODbL&lt;/a&gt; license. However, the metadata contents are not finalized yet. We are currently working on a solution that applies the &lt;a href=&#34;http://www.nature.com/articles/sdata201618&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;FAIR Guiding Principles for scientific data management and stewardship&lt;/a&gt;, and fulfills the mandatory requirements of the Dublic Core metadata standards and at the same time the &lt;a href=&#34;https://support.datacite.org/docs/datacite-metadata-schema-v44-mandatory-properties&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;mandatory requirements&lt;/a&gt;, and most of the &lt;a href=&#34;https://support.datacite.org/docs/datacite-metadata-schema-v44-recommended-and-optional-properties&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;recommended requirements&lt;/a&gt; of DataCite. These changes will be effective before 1 July 2021.&lt;/p&gt;
&lt;p&gt;The &lt;strong&gt;Competition Data Observatory&lt;/strong&gt; temporarily shares an API with the &lt;a href=&#34;https://economy.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Economy Data Observatory&lt;/a&gt;, which serves as an incubator for similar economy-oriented reproducible research resources.&lt;/p&gt;














&lt;figure  id=&#34;figure-apigreendealdataobservatoryeuhttpsapigreendealdataobservatoryeudatabasemetadata-descriptive-metadata&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/observatory_screenshots/GDO_API_metadata_table.png&#34; alt=&#34;[api.greendeal.dataobservatory.eu](https://api.greendeal.dataobservatory.eu/database/metadata) descriptive metadata&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption data-pre=&#34;Figure&amp;nbsp;&#34; data-post=&#34;:&amp;nbsp;&#34; class=&#34;numbered&#34;&gt;
      &lt;a href=&#34;https://api.greendeal.dataobservatory.eu/database/metadata&#34;&gt;api.greendeal.dataobservatory.eu&lt;/a&gt; descriptive metadata
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;h2 id=&#34;descriptive-metadata&#34;&gt;Descriptive Metadata&lt;/h2&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&#34;text-align:left&#34;&gt;&lt;/th&gt;
&lt;th style=&#34;text-align:center&#34;&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Identifier&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;An unambiguous reference to the resource within a given context. (Dublin Core item), but several identifiders allowed, and we will use several of them.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Creator&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;The main researchers involved in producing the data, or the authors of the publication, in priority order. To supply multiple creators, repeat this property. (Extends the Dublin Core with multiple authors, and legal persons, and adds affiliation data.)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Title&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;A name given to the resource. Extends Dublin Core with alternative title, subtitle, translated Title, and other title(s).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Publisher&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;The name of the entity that holds, archives, publishes prints, distributes, releases, issues, or produces the resource. This property will be used to formulate the citation, so consider the prominence of the role. For software, use Publisher for the code repository. (Dublin Core item.)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Publication Year&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;The year when the data was or will be made publicly available.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Resource Type&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;We publish Datasets, Images, Report, and Data Papers. (Dublin Core item with controlled vocabulary.)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h3 id=&#34;recommended-for-discovery&#34;&gt;Recommended for discovery&lt;/h3&gt;
&lt;p&gt;The &lt;strong&gt;Recommended&lt;/strong&gt; (R) properties are optional, but strongly recommended for interoperability.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&#34;text-align:left&#34;&gt;&lt;/th&gt;
&lt;th style=&#34;text-align:center&#34;&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Subject&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;The topic of the resource. (Dublin Core item.)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Contributor&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;The institution or person responsible for collecting, managing, distributing, or otherwise contributing to the development of the resource. (Extends the Dublin Core with multiple authors, and legal persons, and adds affiliation data.) When applicable, we add Distributor (of the datasets and images), Contact Person, Data Collector, Data Curator, Data Manager, Hosting Institution, Producer (for images), Project Manager, Researcher, Research Group, Rightsholder, Sponsor, Supervisor&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Date&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;A point or period of time associated with an event in the lifecycle of the resource, besides the Dublin Core minimum we add Collected, Created, Issued, Updated, and if necessary, Withdrawn dates to our datasets.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Related Identifier&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;An identifier or identifiers other than the primary Identifier applied to the resource being registered.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Rights&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;We give &lt;a href=&#34;https://spdx.org/licenses/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;SPDX License List&lt;/a&gt; standards rights description with URLs to the actual license. (Dublin Core item: Rights Management)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Description&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;Recommended for discovery.(Dublin Core item.)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;GeoLocation&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;Similar to Dublin Core item Coverage&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;Subject&lt;/code&gt; property: we need to set standard coding schemas for each observatory.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Contributor&lt;/code&gt; property:
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;DataCurator&lt;/code&gt; the curator of the dataset, who sets the mandatory properties.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;DataManager&lt;/code&gt; the person who keeps the dataset up-to-date.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;ContactPerson&lt;/code&gt; the person who can be contacted for reuse requests or bug reports.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;Date&lt;/code&gt; property contains the following dates, which are set automatically by the &lt;a href=&#34;https://r.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;dataobservatory R package&lt;/a&gt;:
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;Updated&lt;/code&gt; when the dataset was updated;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;EarliestObservation&lt;/code&gt;, which the earliest, not backcasted, estimated or imputed observation.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;LatestObservation&lt;/code&gt;, which the earliest, not backcasted, estimated or imputed observation.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;UpdatedatSource&lt;/code&gt;, when the raw data source was last updated.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;GeoLocation&lt;/code&gt; is automatically created by the &lt;a href=&#34;https://r.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;dataobservatory R package&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;Description&lt;/code&gt; property optional elements, and we adopted them as follows for the observatories:
&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;Abstract&lt;/code&gt; is a short, textual description; we try to automate its creation as much as a possible, but some curatorial input is necessary.&lt;/li&gt;
&lt;li&gt;In the &lt;code&gt;TechnicalInfo&lt;/code&gt; sub-field, we record automatically the &lt;code&gt;utils::sessionInfo()&lt;/code&gt; for computational reproducability. This is automatically created by the &lt;a href=&#34;https://r.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;dataobservatory R package&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;In the &lt;code&gt;Other&lt;/code&gt; sub-field, we record the keywords for structuring the observatory.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;optional&#34;&gt;Optional&lt;/h3&gt;
&lt;p&gt;The &lt;strong&gt;Optional&lt;/strong&gt; (O) properties are optional and provide richer description. For findability they are not so important, but to create a web service, they are essential. In the mandatory and recommended fields, we are following other metadata standards and codelists, but in the optional fields we have to build up our own system for the observatories.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&#34;text-align:left&#34;&gt;&lt;/th&gt;
&lt;th style=&#34;text-align:center&#34;&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Language&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;A language of the resource. (Dublin Core item.)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Alternative Identifier&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;An identifier or identifiers other than the primary Identifier applied to the resource being registered.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Size&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;We give the CSV, downloadable dataset size in bytes.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Format&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;We give file format information. We mainly use CSV and JSON, and occasionally rds and SPSS types. (Dublin Core item.)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Version&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;The version number of the resource.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Rights&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;We give &lt;a href=&#34;https://spdx.org/licenses/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;SPDX License List&lt;/a&gt; standards rights description with URLs to the actual license. (Dublin Core item: Rights Management)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Funding Reference&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;We provide the funding reference information when applicable. This is usually mandatory with public funds.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Related Item&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;We give information about our observatory partners&#39; related research products, awards, grants (also Dublin Core item as Relation.) We particularly include source information when the dataset is derived from another resource (which is a Dublin Core item.)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;ul&gt;
&lt;li&gt;In the &lt;code&gt;Language&lt;/code&gt; we only use English (eng) at the moment.&lt;/li&gt;
&lt;li&gt;By default We do not use the &lt;code&gt;Alternative Identifier&lt;/code&gt; property. We will do this when the same dataset will be used in several observatories.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;Size&lt;/code&gt; property is measured in bytes for the CSV representation of the dataset. During creations, the software creates a temporary CSV file to check if the dataset has no writing problems, and measures the dataset size.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;Version&lt;/code&gt; property needs further work. For a daily re-freshing API we need to find an applicable versioning system.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;Funding reference&lt;/code&gt; will contain information for donors, sponsors, and co-financing partners.&lt;/li&gt;
&lt;li&gt;Our default setting for &lt;code&gt;Rights&lt;/code&gt; is the &lt;a href=&#34;https://spdx.org/licenses/CC-BY-NC-SA-4.0.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;CC-BY-NC-SA-4.0&lt;/a&gt; license and we provide an URI for the license document.&lt;/li&gt;
&lt;li&gt;In the &lt;code&gt;RelatedItem&lt;/code&gt; we give information about:
&lt;ul&gt;
&lt;li&gt;The original (raw) data source.&lt;/li&gt;
&lt;li&gt;Methodological bibilography reference, when needed.&lt;/li&gt;
&lt;li&gt;The open-source statistical software code that processed the data.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;processing-metadata&#34;&gt;Administrative (Processing) Metadata&lt;/h2&gt;
&lt;p&gt;Like with diamonds, it is better to know the history of a dataset, too. Our administrative metadata contains codelists that follow the SXDX statistical metadata standards, and similarly strucutred information about the processing history of the dataset.&lt;/p&gt;














&lt;figure  id=&#34;figure-apigreendealdataobservatoryeuhttpsapigreendealdataobservatoryeudatabasecodebook-processing-metadata&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/observatory_screenshots/GDO_API_codebook_table.png&#34; alt=&#34;[api.greendeal.dataobservatory.eu](https://api.greendeal.dataobservatory.eu/database/codebook) processing metadata&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption data-pre=&#34;Figure&amp;nbsp;&#34; data-post=&#34;:&amp;nbsp;&#34; class=&#34;numbered&#34;&gt;
      &lt;a href=&#34;https://api.greendeal.dataobservatory.eu/database/codebook&#34;&gt;api.greendeal.dataobservatory.eu&lt;/a&gt; processing metadata
    &lt;/figcaption&gt;&lt;/figure&gt;
&lt;p&gt;See for further reference &lt;a href=&#34;https://r.dataobservatory.eu/articles/codebook.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;The codebook Class&lt;/a&gt;.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th style=&#34;text-align:left&#34;&gt;&lt;/th&gt;
&lt;th style=&#34;text-align:center&#34;&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Observation Status&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;SDMX Code list for &lt;a href=&#34;https://sdmx.org/?sdmx_news=new-version-of-code-list-for-observation-status-version-2-2&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Observation Status 2.2&lt;/a&gt; (CL_OBS_STATUS), such as actual, missing, imputed, etc. values.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Method&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;If the value is estimated, we provide modelling information.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Unit&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;We provide the measurement unit of the data (when applicable.)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Frequency&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;&lt;a href=&#34;https://sdmx.org/?page_id=3215/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;SDMX Code list for Frequency 2.1 (CL_FREQ)&lt;/a&gt; frequency values&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Codelist&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;Euros-SDMX Codelist entries for the observational units, such as sex, etc.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Imputation&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;SDMX Code list for Frequency 2.1 (CL_IMPUT_METH) imputation values&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Estimation&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;The estimation methodology of data that we calculated, together with citation information and URI to the actual processing code&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&#34;text-align:left&#34;&gt;Related Item&lt;/td&gt;
&lt;td style=&#34;text-align:center&#34;&gt;We give information about the software code that processed the data (both Dublin Core and DataCite compliant.)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;See an example in the &lt;a href=&#34;https://r.dataobservatory.eu/articles/codebook.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;The codebook Class&lt;/a&gt; article of the &lt;a href=&#34;https://r.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;dataobservatory R package&lt;/a&gt;.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Ensuring the Visibility and Accessibility of European Creative Content on the World Market: The Need for Copyright Data Improvement in the Light of New Technologies</title>
      <link>/post/2021-02-13-european-visibility/</link>
      <pubDate>Sat, 13 Feb 2021 18:10:00 +0200</pubDate>
      <guid>/post/2021-02-13-european-visibility/</guid>
      <description>&lt;p&gt;The majority of music sales in the world is driven by AI-algorithm powered robots that create personalized playlists, recommendations and help programming radio music streams or festival lineups. It is critically important that an artist’s work is documented, described in a way that the algorithm can work with it.&lt;/p&gt;
&lt;p&gt;In our research paper – soon to be published – made for the Listen Local Initiative we found that 15% of Dutch, Estonian, Hungarian, or Slovak artists had no chance to be recommended, and they usually end up on &lt;a href=&#34;post/2020-11-17-recommendation-analysis/&#34;&gt;Forgetify&lt;/a&gt;, an app that lists never-played songs of Spotify. In another project with rights management organizations, we found that about half of the rightsholders are at risk of not getting all their royalties from the platforms because of poor documentation.&lt;/p&gt;
&lt;p&gt;But how come that distributors give streaming platforms songs that are not properly documented?  What sort of information is missing for the European repertoire’s visibility?  Reprex is exploring this problem in a practical cooperation with SOZA, the Slovak Performing and Mechanical Rights Society, and in an academic cooperation that involves leading researchers in the field. A manuscript co-authored Martin Senftleben, director of the &lt;a href=&#34;https://www.ivir.nl/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Institute for Information Law&lt;/a&gt; in Amsterdam, and eminent researchers in copyright law and music economics, Reprex’s co-founder makes the case that Europe must invest public money to resolve this problem, because in the current scenario, the documentation costs of a song exceed the expected income from streaming platforms.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;In the European Strategy for Data, the European Commission highlighted the EU’s ambition to acquire a leading role in the data economy. At the same time, the Commission conceded that the EU would have to increase its pools of quality data available for use and re-use. In the creative industries, this need for enhanced data quality and interoperability is particularly strong. Without data improvement, unprecedented opportunities for monetising the wide variety of EU creative and making this content available for new technologies, such as artificial intelligence training systems, will most probably be lost. The problem has a worldwide dimension. While the US have already taken steps to provide an integrated data space for music as of 1 January 2021, the EU is facing major obstacles not only in the field of music but also in other creative industry sectors. Weighing costs and benefits, there can be little doubt that new data improvement initiatives and sufficient investment in a better copyright data infrastructure should play a central role in EU copyright policy. A trade-off between data harmonisation and interoperability on the one hand, and transparency and accountability of content recommender systems on the other, could pave the way for successful new initiatives. &lt;a href=&#34;https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3785272&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Download the manuscript from SSRN&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Our &lt;a href=&#34;post/2020-12-17-demo-slovak-music-database/&#34;&gt;Slovak Demo Music Database&lt;/a&gt; project is a best example for this. We started systematically collect publicly available information from Slovak artists (in our write-in process) and ask them to give GDPR-protected further data (in our opt-in process) to create a comprehensive database that can help recommendation engines as well as market-targeting or educational AI apps.&lt;/p&gt;
&lt;p&gt;We believe that one of the problems of current AI algorithms that they solely or almost only work with English language documentation, putting other, particularly small language repertoires at risk of being buried below well-documented music mainly arriving from the United States.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;We are looking for rightsholders and their organizations, artists,
researchers to work with us to find out how we can increase the visibility of European music.&lt;/em&gt;&lt;/p&gt;
</description>
    </item>
    
  </channel>
</rss>
