Introduction and context
1.1 The investigative need
OSINT practitioners operate under data protection law the moment they collect information about an identifiable living person. Under UK GDPR, that collection is lawful only if it has a named basis in Article 6. For journalism and public-interest investigation, that basis is almost always legitimate interests under Article 6(1)(f), qualified by the journalism exemption in Schedule 2 Part 5 of the Data Protection Act 2018. The journalism exemption is conditional, not absolute.
Defensible collection means three things. First, you can name the lawful basis before you start. Second, the scope of what you collect is proportionate to the investigative question. Third, you can explain, after the fact, why each piece of collected data was necessary. Investigators who cannot do this face enforcement risk from the ICO, civil claims from subjects, and editorial rejection from publications whose legal teams will not run the story.
This tutorial walks one investigation type end to end: investigating a named individual suspected of corporate wrongdoing. It gives you a decision flowchart you can apply the same way every time.
1.2 Learning outcomes
- Identify the Article 6 UK GDPR lawful basis for a specific OSINT collection task before starting it.
- Apply the three-part balancing test for legitimate interests and record the reasoning in a form an editor or counsel can review.
- Scope a collection operation to meet the data minimisation principle under Article 5(1)(c).
- Distinguish defensible public-interest collection from conduct that crosses into doxxing, harassment or unauthorised access.
- Produce a written collection decision record that is defensible under an ICO complaint or a civil claim.
1.3 Threat model
- ICO enforcement action for collection that lacks an identifiable lawful basis.
- Civil claims by the subject under Article 82 UK GDPR for unlawful processing.
- Editorial or in-house counsel rejection of a story on the basis that the collection cannot be justified.
Foundational theory and ethical-legal framework
Personal data: any information relating to an identified or identifiable living individual, under Article 4(1) UK GDPR. A name alone qualifies. A social media handle that can be linked to a real person qualifies. The threshold is low.**Processing:** any operation performed on personal data, under Article 4(2) UK GDPR. Collection, storage, consultation, retrieval, organisation and dissemination all count. Opening a public profile and copying its contents into your case file is processing.
Lawful basis: one of the six grounds in Article 6(1) UK GDPR that make processing lawful. For OSINT by journalists and investigators, the relevant ground is almost always Article 6(1)(f), legitimate interests. You must name the basis before collection begins.
Journalism exemption: the set of derogations in Schedule 2 Part 5 of the Data Protection Act 2018, which disapply certain GDPR provisions for processing carried out with a view to the publication of journalistic material in the public interest. It is conditional, and it protects processing for publication, not processing for private use.
Data minimisation: the principle in Article 5(1)(c) UK GDPR that personal data shall be adequate, relevant and limited to what is necessary for the stated purpose. Collecting more than the investigation requires is itself a violation, independent of what is done with the excess.
2.2 Ethical and legal boundaries
This tutorial is written against UK GDPR and the Data Protection Act 2018. Where the subject is resident in an EU member state or a third country, additional obligations apply that this tutorial does not cover.
Three specific legal points apply to the collection phase.
First, the journalism exemption in Schedule 2 Part 5 DPA 2018 disapplies certain GDPR provisions only where processing is carried out with a view to the publication of journalistic, academic, artistic or literary material, where the controller reasonably believes publication would be in the public interest, and where compliance with the disapplied provision would be incompatible with that purpose. All three tests must be satisfied. Private collection for future unspecified use does not qualify.
Second, Article 6(1)(f) legitimate interests requires a three-part balancing test: a named legitimate interest, necessity of the processing to achieve it, and a balance between the interest and the data subject's rights and freedoms. The ICO has published guidance treating investigative journalism as a legitimate interest, but the balancing test still applies on each collection.
2.3 Device compartmentalisation
This tutorial works at the legal and procedural layer, not the device layer. It assumes the reader is already operating from the compartmentalised research environment set up in CZ1: an investigative browser profile separated from personal use, a VPN routing research traffic, and a sockpuppet identity that does not link to the investigator's real identity. Those controls address the question of what the subject can observe during collection. They do not address the question of whether the collection is lawful.
Lawful basis analysis is separate from OPSEC. A perfectly compartmentalised investigation can still be unlawful collection. Conversely, a legally defensible investigation can still leak operator identity to the subject through poor OPSEC. Treat the two as orthogonal controls.
Applied methodology
3.1 Required tools and setup
ICO Guide to UK GDPR: primary reference for lawful basis analysis and data minimisation under UK law. Used as the authority to cite when recording a balancing test. Free, maintained by the ICO.
Data Protection Act 2018, Schedule 2 Part 5: statutory source for the journalism exemption. Read directly, not through secondary summaries. The disapplied provisions and the conditions for disapplication are specific, and summaries routinely simplify them.
A written collection log: plain text file or spreadsheet in the research environment, structured to record the lawful basis, the balancing test where applicable, the scope of collection, and the retention decision for each investigation. The format is less important than the habit. A collection log you do not maintain is worse than none because it creates a false record.
Case management folder structure: a per-investigation folder that holds the collection log, the collected artefacts and the decision record. Discoverable under a subject access request. The structure should make it possible to demonstrate what was collected, when, and under what basis.
3.2 OPSEC and target awareness
Treat documentation as an OPSEC control against legal discovery, not an embarrassment. An investigator who can produce a dated, reasoned collection record is in a materially stronger position than one who cannot, regardless of whether the underlying collection was identical.
3.3 Practical execution
Tool or operator | Investigative query | Investigative value |
|---|---|---|
ICO Guide to UK GDPR | Which lawful basis fits this specific collection task | Confirms the named Article 6 ground before processing begins |
DPA 2018 Schedule 2 Part 5 | Which GDPR provisions can the journalism exemption disapply, given the facts | Identifies disapplications and their conditions |
Collection log (plain text or spreadsheet) | Record the question, basis, balancing test, scope, exclusions and stop condition | Produces the defensible record |
Three-part balancing test (written) | Legitimate interest, necessity, balance | Satisfies the Article 6(1)(f) reasoning requirement |
Scope exclusion list (written) | Which categories of data will not be collected, and why | Evidence of Article 5(1)(c) compliance |
Stop condition (written) | What ends the collection | Bounds the investigation and supports proportionality |
3.4 Visual documentation standards
Screenshot the collection log entry for each investigation at the point of creation, before any data is collected. Filename should carry the investigation reference and date.
Screenshot each source page at the point of first capture, with the browser's address bar visible, the full URL visible, and the system clock visible.
Preserve original URLs in the log, not shortened versions.
For each Companies House or public-register lookup, screenshot the result page with the filing date and the officer name visible.
Retain the balancing-test document as a dated PDF in the investigation folder, generated at the point of the decision and not modified afterwards. Subsequent versions go in separate files.
Do not annotate original captures. Annotations go on copies. The unannotated originals are what survive.
3.5 Data preservation and chain of custody
Collection logs, source captures and the balancing-test document together form the defensible record. Preserve them in a per-investigation folder with file-level hashing at the point of capture. SHA-256 hashes recorded in the log at capture time allow you to demonstrate, later, that the files have not been modified since. If a subject access request or ICO complaint follows, the hash chain is what turns "we documented it" into something verifiable.
Verification and analysis for reporting
4.1 Corroboration strategy
The legal layer is verified against statute and regulator guidance, not against source agreement. Before publication, confirm the following are in place:
The Article 6 lawful basis named in the collection log matches the processing actually carried out.
The three-part balancing test for Article 6(1)(f) is recorded in writing and dated before collection began, not reconstructed afterwards.
The journalism exemption, where relied on, is documented against the specific GDPR provisions it disapplies. - The scope exclusion list was observed in practice; no data outside the scope was retained.
In-house counsel or an external lawyer has reviewed the record before publication where the subject is a private individual or the collection touches special category data.
4.2 Technical caveats and false positives
The journalism exemption is not a blanket defence. The exemption in DPA 2018 Schedule 2 Part 5 disapplies named provisions under named conditions. It does not disapply the lawful basis requirement itself, does not disapply the security obligations under Article 32, and does not disapply liability for inaccurate processing. Investigators who treat the exemption as a general licence produce records that fail on review. The exemption is a tool; it is not a shield.
Public does not mean freely processable. Data lawfully in the public domain is still personal data under Article 4(1) UK GDPR. Its presence on a public website is a relevant factor in the Article 6(1)(f) balancing test, but it does not remove the requirement for a lawful basis. The ICO has taken enforcement action against organisations that scraped public data on the assumption that "public equals fair game".
Subject of legitimate public interest is not the same as any named individual. The balancing test for Article 6(1)(f) weighs against the rights and freedoms of the specific data subject. A CEO in a position of public accountability is not in the same position as an employee who witnessed misconduct but is not themselves the subject of the story. The balancing test applies to each identifiable person in the collection, not only to the principal subject.
Historical archive does not neutralise live processing. Scraping a Wayback Machine capture, a cached search result or a mirrored archive is still processing personal data at the point you collect it. The original publication's age reduces but does not eliminate the rights of the subject, particularly where the subject has subsequently exercised erasure rights.
Recording the basis after collection is not defensible. A collection log created retrospectively, to justify processing already carried out, is not a collection log. It is a reconstruction. Under both ICO enforcement and civil discovery, the dating of the decision record matters. If the log did not exist before collection, treat it as absent.
4.3 Linking data to narrative
Technical finding | Journalistic interpretation |
|---|---|
Article 6(1)(f) balancing test, dated and recorded before collection | Justification for processing the subject's personal data is defensible in writing |
Scope exclusion list, observed in practice | Collection was proportionate under Article 5(1)(c) |
Journalism exemption recorded against specific disapplied provisions | The exemption was relied on for identifiable reasons, not as a general claim |
Stop condition met, collection ended | The investigation was bounded; scope creep did not occur |
SHA-256 hash chain intact across all captured artefacts | The collected record has not been modified after the fact |
Counsel review signed off before publication | Editorial and legal sign-off chain is documented |
4.4 AI assistance in the research browser
Practice and resources
5.1 Practice exercise
- Write the investigative question in a single sentence. Check whether the collection you carried out matches the question or exceeds it.
- Name the Article 6 lawful basis you would have recorded at the start. Write the three-part balancing test as you would have recorded it then, not as you would now.
- Identify any special category data collected. Name the Article 9 condition that would have applied.
- List the categories of data collected that the investigative question did not require. These are your minimisation failures.
- Identify the stop condition. If you cannot identify one, the collection did not have one.
- Write a one-paragraph assessment of whether the record would be defensible under an ICO complaint. This paragraph is the deliverable.
5.2 Advanced resources
- ICO Guide to UK GDPR: UK data protection regulator. The primary reference for lawful basis analysis, data minimisation, and special category data conditions. Cite this when recording a balancing test. ico.org.uk/for-organisations/uk-gdpr-guidance-and-resources
- Data Protection Act 2018 Schedule 2: UK statute. The source text for the journalism exemption. Read directly, not through secondary summaries. The disapplied provisions and their conditions are specific. legislation.gov.uk/ukpga/2018/12/schedule/2
- ICO Data Protection and Journalism Code of Practice: UK regulator, statutory code. Required reading where the journalism exemption is relied on. Breaches of the code are admissible in ICO and court proceedings. ico.org.uk/for-organisations/data-protection-and-journalism-code-of-practice
- Council of Europe, Convention 108+: intergovernmental organisation. The modernised data protection convention. Useful for cross-border investigations where UK GDPR is not the only applicable regime. coe.int/en/web/data-protection/convention108-and-protocol
- Global Investigative Journalism Network, legal guides: practitioner network. Cross-jurisdictional guides on investigation law, including privacy regimes outside the UK. Useful when the subject is not UK-resident. gijn.org/resource-center
Session-start checklist
- Name the Article 6 lawful basis before collection begins, and record the three-part balancing test for Article 6(1)(f) in writing. A reconstructed log is not a log.
- The journalism exemption in DPA 2018 Schedule 2 Part 5 is conditional on three tests and disapplies named provisions. It is a tool, not a shield.
- Data lawfully in the public domain is still personal data. Its public availability is a factor in the balancing test; it does not remove the requirement for a lawful basis.
- Scope the collection to the minimum necessary under Article 5(1)(c), write the exclusions, and define the stop condition in advance. Scope creep is where proportionality defences fail.
- The legal layer and the device layer are independent. A compartmentalised investigation can still be unlawful collection; a lawful investigation can still leak identity through poor OPSEC. Treat them as separate controls.


