Covid-19 likely source data - inconsistent caseID data - help!

Hi there Ms data,

This data is critical for NSW at the moment - why is it such a mess?
Thanks for fixing the dates in the CSV data, but now I see that there are other inconsistencies

The case_IDs are not consistent from day to day,
Case 1 on the 9-Apr is not case 1 on the 10-Apr.

CaseID consistency is critical to analyse the progression of analysis from likely source to likely source.
eg how many cases were initially classified as Known-local that were re-classified into Overseas-Interstate?

I’ll pick the oldest cases as these are probably not changing state.
There are 8 cases raised on the 9-Mar, in the 8-Apr report; 5 are Overseas or interstate (O) and 3 are Locally acquired - known (Lk). These sub-totals are the same for 9-Apr, 10-Apr, 11-Apr… BUT it is not the same caseID (#) each time.

8 9 10 11

1 Lk Lk O Lk
2 O O O Lk
3 Lk O Lk O
4 O O O O
5 Lk O Lk O
6 O Lk O O
7 O O O Lk
8 O Lk Lk O

Can we please get the caseID to be a unique consistent indentifier?

  • Matthew

Hi Matthew,

The ID column is not actually part of the dataset. It appears in the preview (and API) as a result of how the data is ingested on CKAN. We are working to correct this. We do apologise for the confusion this has created.

I recommend downloading the raw dataset here.

Please otherwise disregard the ID column as it is not indicative as a unique identifier.

Hi Lance,
The linked dataset does not have the “likely source of infection” information. Do you have a dataset with that? It is the evolution of the “likely source” that I’m attempting to analyse,

  • Matthew

Hi Matthew,

Yes, you can access that data here:

I hope this helps.

Does anyone know why some records (rows) have no postcode?

© Data.NSW