This website uses cookies

Read our Privacy policy and Terms of use for more information.

◆ Methods · Course Zero
GDPR ICO guidance · case law Foundational
Introduction

Introduction and context

1.1 The investigative need

OSINT practitioners operate under data protection law the moment they collect information about an identifiable living person. Under UK GDPR, that collection is lawful only if it has a named basis in Article 6. For journalism and public-interest investigation, that basis is almost always legitimate interests under Article 6(1)(f), qualified by the journalism exemption in Schedule 2 Part 5 of the Data Protection Act 2018. The journalism exemption is conditional, not absolute.

Defensible collection means three things. First, you can name the lawful basis before you start. Second, the scope of what you collect is proportionate to the investigative question. Third, you can explain, after the fact, why each piece of collected data was necessary. Investigators who cannot do this face enforcement risk from the ICO, civil claims from subjects, and editorial rejection from publications whose legal teams will not run the story.

This tutorial walks one investigation type end to end: investigating a named individual suspected of corporate wrongdoing. It gives you a decision flowchart you can apply the same way every time.

1.2 Learning outcomes

Learning outcomes
After completing this tutorial, you will be able to:
  • Identify the Article 6 UK GDPR lawful basis for a specific OSINT collection task before starting it.
  • Apply the three-part balancing test for legitimate interests and record the reasoning in a form an editor or counsel can review.
  • Scope a collection operation to meet the data minimisation principle under Article 5(1)(c).
  • Distinguish defensible public-interest collection from conduct that crosses into doxxing, harassment or unauthorised access.
  • Produce a written collection decision record that is defensible under an ICO complaint or a civil claim.

1.3 Threat model

Threat model
This tutorial defends against three specific threats:
  • ICO enforcement action for collection that lacks an identifiable lawful basis.
  • Civil claims by the subject under Article 82 UK GDPR for unlawful processing.
  • Editorial or in-house counsel rejection of a story on the basis that the collection cannot be justified.
It partially defends against: defamation claims. Lawful collection is necessary but not sufficient; the published claim must also meet defamation defences.
It does not defend against: criminal liability under the Computer Misuse Act 1990, harassment claims under the Protection from Harassment Act 1997, or jurisdictional exposure under non-UK privacy regimes such as EU GDPR where the subject is EU-resident, Swiss FADP, or US state privacy statutes. If your threat model includes any of those, this technique is necessary but not sufficient.
Theory and framework

Foundational theory and ethical-legal framework

Personal data: any information relating to an identified or identifiable living individual, under Article 4(1) UK GDPR. A name alone qualifies. A social media handle that can be linked to a real person qualifies. The threshold is low.**Processing:** any operation performed on personal data, under Article 4(2) UK GDPR. Collection, storage, consultation, retrieval, organisation and dissemination all count. Opening a public profile and copying its contents into your case file is processing.

Lawful basis: one of the six grounds in Article 6(1) UK GDPR that make processing lawful. For OSINT by journalists and investigators, the relevant ground is almost always Article 6(1)(f), legitimate interests. You must name the basis before collection begins.

Journalism exemption: the set of derogations in Schedule 2 Part 5 of the Data Protection Act 2018, which disapply certain GDPR provisions for processing carried out with a view to the publication of journalistic material in the public interest. It is conditional, and it protects processing for publication, not processing for private use.

Data minimisation: the principle in Article 5(1)(c) UK GDPR that personal data shall be adequate, relevant and limited to what is necessary for the stated purpose. Collecting more than the investigation requires is itself a violation, independent of what is done with the excess.

Ethical boundary · stop at the login
If data is behind authentication that the subject erected to control access, it is not public. Creating a sockpuppet account to view a private profile, accepting a friend request under a false identity, scraping gated APIs, or using leaked credentials is outside OSINT and outside the journalism exemption. The rule is: if it requires a login credential the subject did not knowingly make available to the general public, do not collect it without prior written sign-off from counsel.
Legal considerations

This tutorial is written against UK GDPR and the Data Protection Act 2018. Where the subject is resident in an EU member state or a third country, additional obligations apply that this tutorial does not cover.

Three specific legal points apply to the collection phase.

First, the journalism exemption in Schedule 2 Part 5 DPA 2018 disapplies certain GDPR provisions only where processing is carried out with a view to the publication of journalistic, academic, artistic or literary material, where the controller reasonably believes publication would be in the public interest, and where compliance with the disapplied provision would be incompatible with that purpose. All three tests must be satisfied. Private collection for future unspecified use does not qualify.

Second, Article 6(1)(f) legitimate interests requires a three-part balancing test: a named legitimate interest, necessity of the processing to achieve it, and a balance between the interest and the data subject's rights and freedoms. The ICO has published guidance treating investigative journalism as a legitimate interest, but the balancing test still applies on each collection.

2.3 Device compartmentalisation

This tutorial works at the legal and procedural layer, not the device layer. It assumes the reader is already operating from the compartmentalised research environment set up in CZ1: an investigative browser profile separated from personal use, a VPN routing research traffic, and a sockpuppet identity that does not link to the investigator's real identity. Those controls address the question of what the subject can observe during collection. They do not address the question of whether the collection is lawful.

Lawful basis analysis is separate from OPSEC. A perfectly compartmentalised investigation can still be unlawful collection. Conversely, a legally defensible investigation can still leak operator identity to the subject through poor OPSEC. Treat the two as orthogonal controls.

Compartmentalisation · know the layer you are working at
This tutorial works at the legal and procedural layer. It does not cover device isolation, network isolation, persona hygiene, or the forensic integrity of collected artefacts. Those are addressed in CZ1 and in Block 3. The legal layer and the device layer are independent: satisfying one does not satisfy the other. Match the lawful basis to the investigation, match the compartmentalisation layer to the subject's observational capability, and treat them as separate decisions.
Applied methodology

Applied methodology

3.1 Required tools and setup

ICO Guide to UK GDPR: primary reference for lawful basis analysis and data minimisation under UK law. Used as the authority to cite when recording a balancing test. Free, maintained by the ICO.

Data Protection Act 2018, Schedule 2 Part 5: statutory source for the journalism exemption. Read directly, not through secondary summaries. The disapplied provisions and the conditions for disapplication are specific, and summaries routinely simplify them.

A written collection log: plain text file or spreadsheet in the research environment, structured to record the lawful basis, the balancing test where applicable, the scope of collection, and the retention decision for each investigation. The format is less important than the habit. A collection log you do not maintain is worse than none because it creates a false record.

Case management folder structure: a per-investigation folder that holds the collection log, the collected artefacts and the decision record. Discoverable under a subject access request. The structure should make it possible to demonstrate what was collected, when, and under what basis.

3.2 OPSEC and target awareness

OPSEC · target awareness
The subject can observe: their own platform analytics where applicable, any profile views from authenticated accounts, any friend or follow requests received, and, for a subject with technical resources, HTTP referrer patterns on sites they control. Device-layer OPSEC from CZ1 addresses the first four. It does not address the fifth. The legal layer has no OPSEC value on its own; an unlawful collection is no less observable than a lawful one. The mitigating control for legal exposure is documentation at the point of collection, not concealment after the fact. If an ICO complaint arrives twelve months later, the collection log is the defence.

Treat documentation as an OPSEC control against legal discovery, not an embarrassment. An investigator who can produce a dated, reasoned collection record is in a materially stronger position than one who cannot, regardless of whether the underlying collection was identical.

3.3 Practical execution

STEP 01
State the investigative question
Goal: define the scope before defining the basis
Write a single sentence naming what you are trying to establish. For the worked scenario: "Whether the subject's conduct as CEO is consistent with the financial misconduct alleged by the former employee." The question controls the scope. A vague question produces a proportionality failure under Article 5(1)(c). Record the question in the collection log before opening a browser.
STEP 02
Name the Article 6 lawful basis
Goal: match processing to a named statutory ground
For investigative journalism on a matter of public interest concerning an identifiable living person, the basis is Article 6(1)(f) legitimate interests. Record the basis explicitly. Then apply the three-part test: the legitimate interest (public accountability of corporate leadership in a matter of public interest), the necessity (the information cannot be obtained by less intrusive means), and the balance (the subject's rights and freedoms against the public-interest weight of the investigation). Record all three in the log.
STEP 03
Check whether the journalism exemption applies
Goal: determine which GDPR provisions are disapplied
Schedule 2 Part 5 DPA 2018 disapplies specified GDPR provisions where all three conditions are met: processing for the purpose of publication of journalistic material, reasonable belief that publication is in the public interest, and incompatibility of compliance with the journalistic purpose. For the worked scenario, all three will typically be satisfied. Record which specific provisions you are relying on the exemption to disapply, on the facts of this investigation.
STEP 04
Scope the collection to the minimum necessary
Goal: satisfy Article 5(1)(c) data minimisation
List the categories of data required to answer the investigative question. For the worked scenario: Companies House filings naming the subject, LinkedIn and other professional profile data relating to their role, named-director records, any public statements made in their professional capacity. Exclude: family members unless operationally relevant, home address, private social media reflecting personal life unconnected to the role, data about minors. Scope exclusions in writing are part of defensibility, not a nicety.
STEP 05
Define the stop condition
Goal: identify in advance when collection ends
Name the conditions under which collection stops. Typically: the investigative question is answered, the question is definitively disproved, the investigation is killed editorially, or the scope has exceeded the original lawful basis and requires a new one. Collection without a stop condition trends toward scope creep, which is where proportionality defences fail. Record the stop condition in the log alongside the basis.

Tool or operator

Investigative query

Investigative value

ICO Guide to UK GDPR

Which lawful basis fits this specific collection task

Confirms the named Article 6 ground before processing begins

DPA 2018 Schedule 2 Part 5

Which GDPR provisions can the journalism exemption disapply, given the facts

Identifies disapplications and their conditions

Collection log (plain text or spreadsheet)

Record the question, basis, balancing test, scope, exclusions and stop condition

Produces the defensible record

Three-part balancing test (written)

Legitimate interest, necessity, balance

Satisfies the Article 6(1)(f) reasoning requirement

Scope exclusion list (written)

Which categories of data will not be collected, and why

Evidence of Article 5(1)(c) compliance

Stop condition (written)

What ends the collection

Bounds the investigation and supports proportionality

3.4 Visual documentation standards

  • Screenshot the collection log entry for each investigation at the point of creation, before any data is collected. Filename should carry the investigation reference and date.

  • Screenshot each source page at the point of first capture, with the browser's address bar visible, the full URL visible, and the system clock visible.

  • Preserve original URLs in the log, not shortened versions.

  • For each Companies House or public-register lookup, screenshot the result page with the filing date and the officer name visible.

  • Retain the balancing-test document as a dated PDF in the investigation folder, generated at the point of the decision and not modified afterwards. Subsequent versions go in separate files.

  • Do not annotate original captures. Annotations go on copies. The unannotated originals are what survive.

3.5 Data preservation and chain of custody

Collection logs, source captures and the balancing-test document together form the defensible record. Preserve them in a per-investigation folder with file-level hashing at the point of capture. SHA-256 hashes recorded in the log at capture time allow you to demonstrate, later, that the files have not been modified since. If a subject access request or ICO complaint follows, the hash chain is what turns "we documented it" into something verifiable.

Command reference · file hashing at capture
macOS / Linux
shasum -a 256 capture.png >> collection_log_hashes.txt
Windows PowerShell
Get-FileHash -Algorithm SHA256 capture.png | Out-File -Append collection_log_hashes.txt
Verification and analysis

Verification and analysis for reporting

4.1 Corroboration strategy

The legal layer is verified against statute and regulator guidance, not against source agreement. Before publication, confirm the following are in place:

  • The Article 6 lawful basis named in the collection log matches the processing actually carried out.

  • The three-part balancing test for Article 6(1)(f) is recorded in writing and dated before collection began, not reconstructed afterwards.

  • The journalism exemption, where relied on, is documented against the specific GDPR provisions it disapplies. - The scope exclusion list was observed in practice; no data outside the scope was retained.

  • In-house counsel or an external lawyer has reviewed the record before publication where the subject is a private individual or the collection touches special category data.

4.2 Technical caveats and false positives

The journalism exemption is not a blanket defence. The exemption in DPA 2018 Schedule 2 Part 5 disapplies named provisions under named conditions. It does not disapply the lawful basis requirement itself, does not disapply the security obligations under Article 32, and does not disapply liability for inaccurate processing. Investigators who treat the exemption as a general licence produce records that fail on review. The exemption is a tool; it is not a shield.

Public does not mean freely processable. Data lawfully in the public domain is still personal data under Article 4(1) UK GDPR. Its presence on a public website is a relevant factor in the Article 6(1)(f) balancing test, but it does not remove the requirement for a lawful basis. The ICO has taken enforcement action against organisations that scraped public data on the assumption that "public equals fair game".

Subject of legitimate public interest is not the same as any named individual. The balancing test for Article 6(1)(f) weighs against the rights and freedoms of the specific data subject. A CEO in a position of public accountability is not in the same position as an employee who witnessed misconduct but is not themselves the subject of the story. The balancing test applies to each identifiable person in the collection, not only to the principal subject.

Historical archive does not neutralise live processing. Scraping a Wayback Machine capture, a cached search result or a mirrored archive is still processing personal data at the point you collect it. The original publication's age reduces but does not eliminate the rights of the subject, particularly where the subject has subsequently exercised erasure rights.

Recording the basis after collection is not defensible. A collection log created retrospectively, to justify processing already carried out, is not a collection log. It is a reconstruction. Under both ICO enforcement and civil discovery, the dating of the decision record matters. If the log did not exist before collection, treat it as absent.

4.3 Linking data to narrative

Technical finding

Journalistic interpretation

Article 6(1)(f) balancing test, dated and recorded before collection

Justification for processing the subject's personal data is defensible in writing

Scope exclusion list, observed in practice

Collection was proportionate under Article 5(1)(c)

Journalism exemption recorded against specific disapplied provisions

The exemption was relied on for identifiable reasons, not as a general claim

Stop condition met, collection ended

The investigation was bounded; scope creep did not occur

SHA-256 hash chain intact across all captured artefacts

The collected record has not been modified after the fact

Counsel review signed off before publication

Editorial and legal sign-off chain is documented

4.4 AI assistance in the research browser

AI assistance · legal reasoning is where AI hallucinates most confidently
Large language models produce fluent, plausible legal analysis that is wrong in specific ways. They fabricate case citations. They conflate UK GDPR with EU GDPR where the two have diverged. They state the journalism exemption as broader than it is. The technique-specific threat here is that an AI-drafted balancing test reads like a defensible record but fails under review because the cited authorities do not say what the draft claims. Use AI to structure the log format or to draft a first-pass balancing test, never to cite statute or case law without manual verification against the primary source.
Privacy warning
Do not paste the subject's name, the investigation scope or the collected material into hosted AI providers. Hosted-provider terms typically permit training on submitted content. Use a local LLM or a verified privacy-mode enterprise deployment for any prompt that contains personal data about an investigation subject. The act of pasting personal data into a hosted provider is itself processing under Article 4(2) UK GDPR and requires its own lawful basis.
Verification warning
AI output on statutory interpretation requires human verification against the primary source. Every claimed provision, paragraph number or case reference must be checked against the ICO Guide or legislation.gov.uk before the output enters the record. AI cannot verify law; it can only pattern-match against its training data, which is by definition stale.
Practice and resources

Practice and resources

5.1 Practice exercise

Practice exercise
Take a recent OSINT investigation you have carried out, published or not, and produce a retrospective collection log for it. The goal is to see where the record would have failed if it had been challenged.
  1. Write the investigative question in a single sentence. Check whether the collection you carried out matches the question or exceeds it.
  2. Name the Article 6 lawful basis you would have recorded at the start. Write the three-part balancing test as you would have recorded it then, not as you would now.
  3. Identify any special category data collected. Name the Article 9 condition that would have applied.
  4. List the categories of data collected that the investigative question did not require. These are your minimisation failures.
  5. Identify the stop condition. If you cannot identify one, the collection did not have one.
  6. Write a one-paragraph assessment of whether the record would be defensible under an ICO complaint. This paragraph is the deliverable.
Estimated time: 60 minutes

5.2 Advanced resources

Advanced resources

Session-start checklist

Session-start checklist · 5 minutes, every session
Open the collection log for the current investigation. If none exists, create one before proceeding.
Confirm the Article 6 lawful basis recorded at investigation start still fits the work planned for this session.
Check whether today's planned collection exceeds the original scope. If it does, stop and revise the basis before proceeding.
Confirm the stop condition. If you are within one step of it, plan for collection to end this session.
Log the session start timestamp. Hash any new captures at the point of capture, not at the end of the session.
Key takeaways
Key takeaways
  1. Name the Article 6 lawful basis before collection begins, and record the three-part balancing test for Article 6(1)(f) in writing. A reconstructed log is not a log.
  2. The journalism exemption in DPA 2018 Schedule 2 Part 5 is conditional on three tests and disapplies named provisions. It is a tool, not a shield.
  3. Data lawfully in the public domain is still personal data. Its public availability is a factor in the balancing test; it does not remove the requirement for a lawful basis.
  4. Scope the collection to the minimum necessary under Article 5(1)(c), write the exclusions, and define the stop condition in advance. Scope creep is where proportionality defences fail.
  5. The legal layer and the device layer are independent. A compartmentalised investigation can still be unlawful collection; a lawful investigation can still leak identity through poor OPSEC. Treat them as separate controls.
Next in this series
Tutorial 1 Pinpointing location through visual clues: shadows, architecture and terrain
The first tutorial in Block 1. A single still from a recent event, worked end to end using sun-shadow analysis, architecture and terrain. Publishes 4 May 2026.
Evidentiary Standard
This tutorial was produced using the Signal & Shadow methodology framework. All techniques described apply only to publicly accessible data. No method described here involves or endorses unauthorised access to systems or data. Verify all findings independently before publication.
About Signal & Shadow
Signal & Shadow is an independent forensic investigation and methodology practice. Methods is the structured tutorial series, published for investigators and journalists working in OSINT and digital forensics.

Reply

Avatar

or to participate

Keep Reading