Reports

OpenSAFELY-TPP database builds

Description
This report provides information about data import dates and recent activity in the OpenSAFELY-TPP database.
Contact
Get in touch and tell us how you use this report or new features you'd like to see: [email protected]
First published
06 Jan 2021
Last updated
20 Jul 2021

Import dates and date coverage for OpenSAFELY-TPP data sources

This OpenSAFELY notebook provides data import dates and counts of raw event data for externally linked data sources in the OpenSAFELY-TPP database. It is part of the technical documentation for users of the OpenSAFELY platform to guide analyses and it should not be used for inference about any aspect of the pandemic, public health, or health service activity. For the avoidance of doubt: any clinical or epidemiological interpretation of this raw information is likely to be a misinterpretation.

If you would like to apply to use the OpenSAFELY platform please read our documentation, the principles of the platform, and information about our pilot programme for onboarding external users.

If you want to see the Python code used to create this notebook, you can view it on GitHub.

Data sources

The core SystmOne primary care datasets are held in the S1 tables in the OpenSAFELY-TPP database. The delay from events being recorded in SystmOne to being available in OpenSAFELY-TPP is around 2 - 9 days. Reducing this to one day is possible for urgent queries where necessary.

All externally-linked data sources are listed below, with the table name given in brackets:

  • All positive or negative SARS-CoV2 tests, from SGSS (SGSS_AllTests_Positive and SGSS_AllTests_Negative)
  • First-ever positive or negative SARS-CoV2 test, from SGSS (SGSS_Positive and SGSS_Negative)
  • A&E attendances, from SUS Emergency Care data (EC)
  • In-patient hospital admissions, from SUS Admitted Patient Care Spells data (APCS)
  • Out-patient hospital appointments, from SUS (OPA)
  • Covid-related ICU admissions, from ICNARC (ICNARC)
  • Covid-related in-hospital deaths, from CPNS (CPNS)
  • COVID-19 Infection Survey, from ONS (ONS_CIS)
  • All-cause registered deaths, from ONS (ONS_Deaths)
  • High cost drugs (HighCostDrugs)
  • Unique Property Reference Number, used for deriving household variables (UPRN)
  • Master Patient Index (MPI)
  • Health and Social Care Worker identification, collected at the point of vaccination (HealthCareWorker)

Some of these tables are accompanied by additional tables with further data. For instance, OPA contains the core out-patient appointment event data, and is supplemented by the OPA_Cost, OPA_Diag, OPA_Proc tables. See the data schema notebook for more information.

Notebook run date

This notebook was run on 20 July 2021. The information below reflects the state of the OpenSAFELY-TPP as at this date.

Latest dataset import dates

TPP create a snapshot of the primary care information captured in the SystmOne database which is processed (for example unstructured free-text is removed and other OpensAFELY-specific tables are created) before being imported into the OpenSAFELY-TPP database. TPP also receive (or "ingest") external datasets from NHS Digital, ONS, etc., which are processed and imported into OpenSAFELY-TPP. Each imported dataset over-writes previously-imported data.

Once a dataset has been imported, it can be queried immediately in the secure environment. SystmOne data is imported approximately weekly. External datasets are usually imported within a few days after they have been received by TPP. Each external dataset is received at different times, sometimes irregularly or with unexpected delays.

The dates in the table below reflect when the datasets were last imported into the OpenSAFELY-TPP database. They do not reflect when the data were received by TPP nor when the latest clinical or administrative events captured in each dataset occurred.

datasource latest_import
0 APCS 2021-07-19 14:13:58.067
1 CPNS 2021-07-19 10:36:41.650
2 EC 2021-07-19 11:22:32.937
3 ECDS 2020-04-21 14:48:09.543
4 HealthCareWorker 2021-07-19 11:22:44.387
5 HighCostDrugs 2020-11-27 14:15:31.460
6 ICNARC 2021-01-21 11:21:12.623
7 MPI 2020-08-05 11:26:13.247
8 ONS_CIS 2021-04-28 14:59:47.487
9 ONS_Deaths 2021-07-19 14:14:33.910
10 OPA 2021-07-19 12:26:26.000
11 S1 2021-07-16 11:00:00.087
12 SGSS_AllTests_Negative 2021-07-06 12:59:21.370
13 SGSS_AllTests_Positive 2021-07-06 12:46:56.473
14 SGSS_Negative 2021-07-06 12:46:07.920
15 SGSS_Positive 2021-07-06 12:43:54.393
16 UPRN 2020-08-05 11:30:17.087

All dataset import dates

The figure below shows all dataset import dates for SystmOne and external datasets, up until the date this notebook was run (the vertical black line).

Text(0.5, 1.0, 'Latest dataset import dates as at 20 July 2021')

Event activity in external datasets

In the figures below, event activity (counts of events such as hospital admissions and deaths) is reported for external data sources from 1 February 2020 up to the notebook run date (left plot), and for the latest 30 days of activity up to the most recent event date (right plot).

The left plots can be used to gain a rough idea of event activity over time and helps to sense-check event frequencies and/or population counts in extracted datasets; it should not be used for direct clinical or epidemiological inference. The right plots can be used to gauge the latest reliable date for events recorded in each data source, i.e., a cut-off beyond which the data may be incomplete.

Note that the OpenSAFELY-TPP database only includes people who were ever registered at a GP practice using TPP's SystmOne clinical information system (roughly 40% of GP practices) on or after 1 January 2009, including those who have since deregistered or died. The data therefore captures activity for these patients only.

Counts of five or less are redacted.

Recurrent events / repeat patient IDs

Some datasets may have multiple rows per patient, for instance if the patient was admitted to hospital more than once. Currently a study definition can return either the first event, the last event, or the count of events occurring during the period of interest. The tables below count recurrent events for each dataset from 1 February onwards, up to 5 events.

patients_with_at_least_1_events is the number of unique patients in the dataset. This is the number of events that can be returned by a study variable that takes the first event or the last event, from 1 February onwards.

Repeat events in APCS

total_events
0 8449767
patients_with_at_least_X_events patients_with_exactly_X_events
X
1 4107803 2588476
2 1519327 797116
3 722211 308630
4 413581 144623
5 268958 76820
 

Repeat events in OPA

total_events
0 61750795
patients_with_at_least_X_events patients_with_exactly_X_events
X
1 10250762 2323499
2 7927263 1652919
3 6274344 1209827
4 5064517 922136
5 4142381 704740
 

Repeat events in CPNS

total_events
0 35602
patients_with_at_least_X_events patients_with_exactly_X_events
X
1 35602 35602
 

Repeat events in EC

total_events
0 10889421
patients_with_at_least_X_events patients_with_exactly_X_events
X
1 6248116 3998642
2 2249474 1288561
3 960913 483162
4 477751 213489
5 264262 103920
 

Repeat events in ICNARC

total_events
0 11273
patients_with_at_least_X_events patients_with_exactly_X_events
X
1 9468 8005
2 1463 1189
3 274 221
4 53 42
5 11 7
 

Repeat events in SGSS_Positive

total_events
0 1570788
patients_with_at_least_X_events patients_with_exactly_X_events
X
1 1570449 1570110
2 339 339
 

Repeat events in SGSS_AllTests_Positive

total_events
0 1917913
patients_with_at_least_X_events patients_with_exactly_X_events
X
1 1574586 1332582
2 242004 184287
3 57717 37197
4 20520 11518
5 9002 3926
 

Repeat events in ONS_Deaths

total_events
0 344158
patients_with_at_least_X_events patients_with_exactly_X_events
X
1 344158 344158
 

OpenSAFELY-TPP database builds

Description
This report provides information about data import dates and recent activity in the OpenSAFELY-TPP database.
Contact
Get in touch and tell us how you use this report or new features you'd like to see: [email protected]
First published
06 Jan 2021
Last updated
20 Jul 2021

Import dates and date coverage for OpenSAFELY-TPP data sources

This OpenSAFELY notebook provides data import dates and counts of raw event data for externally linked data sources in the OpenSAFELY-TPP database. It is part of the technical documentation for users of the OpenSAFELY platform to guide analyses and it should not be used for inference about any aspect of the pandemic, public health, or health service activity. For the avoidance of doubt: any clinical or epidemiological interpretation of this raw information is likely to be a misinterpretation.

If you would like to apply to use the OpenSAFELY platform please read our documentation, the principles of the platform, and information about our pilot programme for onboarding external users.

If you want to see the Python code used to create this notebook, you can view it on GitHub.

Data sources

The core SystmOne primary care datasets are held in the S1 tables in the OpenSAFELY-TPP database. The delay from events being recorded in SystmOne to being available in OpenSAFELY-TPP is around 2 - 9 days. Reducing this to one day is possible for urgent queries where necessary.

All externally-linked data sources are listed below, with the table name given in brackets:

  • All positive or negative SARS-CoV2 tests, from SGSS (SGSS_AllTests_Positive and SGSS_AllTests_Negative)
  • First-ever positive or negative SARS-CoV2 test, from SGSS (SGSS_Positive and SGSS_Negative)
  • A&E attendances, from SUS Emergency Care data (EC)
  • In-patient hospital admissions, from SUS Admitted Patient Care Spells data (APCS)
  • Out-patient hospital appointments, from SUS (OPA)
  • Covid-related ICU admissions, from ICNARC (ICNARC)
  • Covid-related in-hospital deaths, from CPNS (CPNS)
  • COVID-19 Infection Survey, from ONS (ONS_CIS)
  • All-cause registered deaths, from ONS (ONS_Deaths)
  • High cost drugs (HighCostDrugs)
  • Unique Property Reference Number, used for deriving household variables (UPRN)
  • Master Patient Index (MPI)
  • Health and Social Care Worker identification, collected at the point of vaccination (HealthCareWorker)

Some of these tables are accompanied by additional tables with further data. For instance, OPA contains the core out-patient appointment event data, and is supplemented by the OPA_Cost, OPA_Diag, OPA_Proc tables. See the data schema notebook for more information.

Notebook run date

This notebook was run on 20 July 2021. The information below reflects the state of the OpenSAFELY-TPP as at this date.

Latest dataset import dates

TPP create a snapshot of the primary care information captured in the SystmOne database which is processed (for example unstructured free-text is removed and other OpensAFELY-specific tables are created) before being imported into the OpenSAFELY-TPP database. TPP also receive (or "ingest") external datasets from NHS Digital, ONS, etc., which are processed and imported into OpenSAFELY-TPP. Each imported dataset over-writes previously-imported data.

Once a dataset has been imported, it can be queried immediately in the secure environment. SystmOne data is imported approximately weekly. External datasets are usually imported within a few days after they have been received by TPP. Each external dataset is received at different times, sometimes irregularly or with unexpected delays.

The dates in the table below reflect when the datasets were last imported into the OpenSAFELY-TPP database. They do not reflect when the data were received by TPP nor when the latest clinical or administrative events captured in each dataset occurred.

datasource latest_import
0 APCS 2021-07-19 14:13:58.067
1 CPNS 2021-07-19 10:36:41.650
2 EC 2021-07-19 11:22:32.937
3 ECDS 2020-04-21 14:48:09.543
4 HealthCareWorker 2021-07-19 11:22:44.387
5 HighCostDrugs 2020-11-27 14:15:31.460
6 ICNARC 2021-01-21 11:21:12.623
7 MPI 2020-08-05 11:26:13.247
8 ONS_CIS 2021-04-28 14:59:47.487
9 ONS_Deaths 2021-07-19 14:14:33.910
10 OPA 2021-07-19 12:26:26.000
11 S1 2021-07-16 11:00:00.087
12 SGSS_AllTests_Negative 2021-07-06 12:59:21.370
13 SGSS_AllTests_Positive 2021-07-06 12:46:56.473
14 SGSS_Negative 2021-07-06 12:46:07.920
15 SGSS_Positive 2021-07-06 12:43:54.393
16 UPRN 2020-08-05 11:30:17.087

All dataset import dates

The figure below shows all dataset import dates for SystmOne and external datasets, up until the date this notebook was run (the vertical black line).

Text(0.5, 1.0, 'Latest dataset import dates as at 20 July 2021')