P0261 – Turning Text into Data

P0261 - Turning Text into Data: An Exploration of the Potential of Natural Language Processing Techniques to Extract Information from Parole Board Decision Letters - Privacy Notice

Principal Investigators

Miss Erica Kane, Data Analytics and Society PhD student at the University of Leeds.

About the research

The project will be conducted as the main research for the PhD of the principal investigator. The project has 2 main aims:

  1. To explore how Natural Language Processing techniques can be used to extract decision relevant information from Parole Board decision letters, and
  2. To analyse extracted data for evidence of disparity on the basis of race and ethnicity.

Both aims will use data from the Parole Board in the form of decision letters. These are produced by a Board member at the end of a hearing. They are in pdf format of length two-three pages.

Aim 1 will be completed using Natural Language Processing algorithms to automatically extract information from the summaries on a large-scale basis and formulate them into a dataset. The first stage of this aim will be an anonymisation process which removes all personal identifiers from the data. After this, relevant decision information will be extracted such as risk factors, crime type, behaviour, gender,  or victim statements). This process will be exploratory, as the structure and content of the decision letters will dictate the information which can be extracted, and the methods used. Aim 2 will use the dataset which is produced from aim 1. The data will be explored to understand the sample, then analysed to identify any relationships between race/ethnicity and release decision.

Where do we obtain data from?

All data is obtained directly from the Parole Board. They will send all decision letters they have produced over the last two – three years. These letters relate to the hearing of an offender who went through an oral or written hearing with the Parole Board.

What data do we hold?

Each letter relates to the case of an individual offender and includes information on the offender (their name and crime(s) committed), an introduction to the case (dates, type of hearing, whether a victim statement was read at the hearing), sentence details (type and length of sentence), a risk assessment (risk factors, behaviour, protective factors), and the decision. There is no location data in the letters are there are no names or identifiers of anyone else involved in the case or hearing – for example witnesses, victims, or any panel members. All letters also include a case reference numbers.

As the data does include names and unique identification numbers (case reference numbers) which are personal identifiers, it is considered ‘personal data’ under the General Data Protection Regulation. It also includes criminal offence data which is personal data about criminal convictions and offences.

A decision has not been made yet on where the information based on race and ethnicity will come from as it is not included in the decision letters. The Privacy Notice will be updated when this has been decided with the Parole Board.

Who will process personal information?

The decision letters will only be accessed by Erica Kane, the principal investigator. They will only be used for the purposes of this project. Any publication resulting from the research (presentation, paper, thesis) will include summarised or aggregated data which will not be identifiable to any individual. The data will not be used commercially or provided to any third party. The only transfer of the decision letters will be between the Parole Board and the principal investigator.

What is the legal basis for the processing?

Under the General Data Protection Regulation, a legal basis for processing personal data and criminal offence data must be identified. The legal basis for processing the personal data (i.e. names and case reference numbers) is under Article 6 (1)(e) of the GDPR: Processing is necessary for the performance of a task carried out in the public interest.

The criminal offence data is also protected under the Article 6(1)(e) basis of being necessary for the performance of a task in the public interest. The processing of criminal offence data must also comply with an article 10 condition. The data is processed for research purposes which is in compliance with Schedule 1, paragraph 4.

How will you keep my data secure?

The Parole Board is the data controller for the research and will dictate the terms of processing through a Data Sharing Agreement with the data processor (Erica Kane). As the research is being conducted within the University of Leeds the appropriate technical measures can be utilised to ensure the data is protected.

All data which contains names and case reference numbers will be stored in a LASER Tier 4 Virtual Research Environment (VRE) that complies with ISO27001 security standards. Within the VRE the data flow is controlled and there is no internet access to networks outside. On top of this, personal data will only be accessed through a safe room environment. Once the data has been anonymised it will be transferred to a second Tier 3 VRE which can be remotely accessed.

If the case reference number is necessary for data linkage between the source data and any further data which is provided for race and ethnicity information the Privacy Notice will be updated to reflect this.

How can I access my personal information?

When information is processed in a research environment the right to access does not apply. This is because fulfilling them could impact the study and it being conducted in an integral manner.

However, the data for this research comes directly from the Parole Board and if you wish to access your personal information form the Board you can read about how in their Privacy Notice.

For how long is my information kept?

All data will be stored by the University until three months after the completion of the PhD (approximately December 2024). After this period, it will be securely destroyed. The data needs to be held for the entire duration to allow iterations in the research which are necessary for fair and reliable analysis.

Who can I contact?

If you have any questions about the research, please contact Erica Kane on lw16em@leeds.ac.uk.

If you have any questions about the research environments in which the University store data you can contact lida@leeds.ac.uk.

How can I complain?

You can contact the Data Protection Officer, Alice Temple, on a.c.temple@leeds.ac.uk.