Working With Novel Surveillance Data

We outline public defenders’ needs when working with novel surveillance data, relevant constraints for system designers, and opportunities for policy advocacy.

Overview

Storyboard

Issues and Needs

Constraints

potential solutions

policy implications

Surveillance Data and Discovery: Public defenders (PDs) are overwhelmed by the volume and complexity of data that characterizes modern criminal cases. They would benefit from technical tools to process and analyze that data for their clients.

Though public defenders can subpoena for evidence from private companies and collect evidence through independent investigations, they receive the majority of their data as “discovery” from prosecutors, who in turn acquire that data through partnerships with law enforcement, public surveillance systems, or directly from private companies. Thus, public defenders often have little control over the format or (enormous) volume of data they receive. The two most important (and burdensome) forms of discovery were body camera footage and social media data, though public defenders have similar difficulties interpreting digital forensics reports, historical cell site records, and email dumps.

Below, we describe the public defender workflow and needs around social media data.

Every single thing from the cops [to] laboratory analysts … there’s always some element of human decision-making. We need to hire experts [and we] make that person reinvent the whole wheel. Then it’s not just to tell us, did that analyst get the right result? … But the way they phrase the result, is that really an accurate depiction? … Or were they trying to kind of fudge the numbers on the margins?

– Capital Public Defender

Social Media Data: In some instances, PDs may be able to extract evidence in support of their clients using social media data. In other instances, prosecutors use social media data to connect a client to a crime or to paint a narrative of criminality—which PDs are able to disprove by putting a client’s words and actions in the proper context. Unfortunately, making meaning of social media data can be time-consuming. Subpoenas and requests to Facebook, for example, are usually returned as semi-structured HTML or JSON blobs which include every transaction, message, and media (including deleted posts and messages) that a person has made since signing up. Further, PDs usually receive social media reports as unstructured PDFs (as long as 25,000 pages). Some paralegals and tech-savvy PDs are able to write simple programming scripts to extract this data in a more usable format, but many others resort to reading (or failing to read) through the data manually.

Storyboard

Explore this depiction of a public defender’s experiences receiving and managing surveillance data—in this case, social media data.

1. Public defenders often receive a huge amount of data either through subpoenas or from prosecutors during discovery.

2. Public defenders don’t always have the time or technical expertise to go through this. Thus, experts can be hired to help.

3. Without an expert, public defenders often rely on paralegals to help organize and make sense of the data.

4. From the data, public defenders and paralegals need to sieve out relevant time-bound information, such as text exchanges between a defendant and another specific person. Within that, they then look for information that can be used as evidence, such as a specific emoji as indication of personalized use of language.

5. Public defenders and paralegals have to collate data from separate sources, such as prosecutors and different private social media companies.

6. Ideally, any meaningful or relevant data should be easily managable so that public defenders can easily present them in court

Issues & Needs

See the issues and needs associated with each step of the process.

STEP

ISSUES

NEEDS

1. Public defenders often receive a huge amount of data either through subpoenas or from prosecutors during discovery.

Data Format Standardization

Social media companies can take a while to respond to subpoenas, giving public defenders less time to sieve through their data
Public defenders most commonly receive data from prosecutors through discovery, but that data can come in more or less parseable formats

Proprietary Technology Access

Public defenders may receive data in proprietary formats
Without access to proprietary technology, they may be unable to access the data. Examples include proprietary surveillance video software

Data in standardised and parseable formats
Ability to better manage and understand raw data

We get a lot of cases where the feds have gotten a warrant to Instagram. And they will send us a [25 thousand page PDF] which is not usable. We’ll say, “give it to us in its native format.” And you know, we, we have much less [power], because the law doesn’t allow us to get records from Facebook

– Federal Public Defender

2. Hiring experts to go through the data

Acquiring Trustworthy Experts

Hiring experts can be very expensive, especially given the limited funds given to public defense offices. (They can cost anywhere between $200-400/hr depending on their area of expertise)
Trustworthy and reliable experts can be hard to find. Many public defenders acquire experts through word of mouth, though there is no comprehensive system for public defenders to share resources on experts

Access to an affordable and reliable pool of technical experts, including those recommended by other public defenders
Understanding of what qualifications constitute a knowledgeable expert in a particular domain

This guy was good. Why was he good? That’s the challenge – understanding the background of a person and if they’re reliable and trustworthy

– Investigator in a public defense office

3. Paralegals organize data

Coordination & Time Pressure

Organizing data can be very time-consuming given the amount and variety of data in a case
There is little standardization with regard to inputting and organizing data among staff within a public defense office, making coordination and collaboration difficult

Efficient and user-friendly ways to organize data when multiple users are involved

4. Finding relevant information in data

Data Wrangling & Analysis

Depending on their familiarity with a data source or technology, public defenders may not know what data is potentially meaningful for their case. Examples include, from a social media platform, awareness of location and photo history data
Public defenders must have a nuanced approach to data filtering given the variety of data sources and types of expression within said sources. For example, in a case involving data from a social media platform, a public defender may desire all chat history and posts involving a particular slang term, as well as synonymous emojis

Training for public defenders on existing federally prescribed technologies to process data
Understanding of different types of technologies and their corresponding data outputs
Ability to more easily filter for and discover relevant information from a variety of data sources
Flexibility in the ability to extract information with varying levels of nuance and granularity

5. Collating data from different sources

File Format Standardization

File formats are non-standardized, differing depending on the data type and source
Data source examples include data from jail calls, social media platforms, body cameras, and mobile devices

Ability to identify the content of a file, regardless of the data type and source

It’s hard to nail an officer to a lie when … we don’t calibrate the clocks on our body camera, so I don’t know what was their exact time … it seems like we could make … those procedures uniform

– Public Defender

6. Presenting data in court

Relevant Data Extraction & Presentation

It can be time-consuming to extract, edit, and annotate data to be presented effectively in court
Data is not always understood across relevant parties, such as public defenders, paralegals, and investigators

Ability to access highly relevant client data on demand
Ability to effectively communicate and coordinate with involved actors (public defenders, paralegals, and investigators)
Ability to more easily and compellingly present data in court

Solutions

Below, we provide a few potential approaches to addressing public defenders’ challenges working with surveillance data. We encourage you to consider these approaches as well as your own.

Tools raw data (JSON, HTML files) in consumable and parsable formats (such as CSV, .docx)
System for multiple users to tag, comment on, and filter surveillance data
Software solutions to help speed up processing common forms of surveillance data such as:
- Social media: Consider developing a tool which takes a social media dump as raw JSON/HTML and extracts information (about posts, persons, and timestamps) in a way that is easily searchable
- Audio and video: Secure transcription tools and video editing software
- Body camera: Simple technical solutions for sorting, naming, and de-duplicating body camera information
- Digital Forensics: Software that processes digital forensics reports such as those from Cellebrite

Constraints

General Constraints

Surveillance Data Specific Constraints

Lack of resources: Public defense offices have limited budgets and staff to acquire and maintain new technologies.

Privacy and security requirements: Privacy is critical. Not only are the costs to breaches of client privacy extremely high, but public defenders should only have access to surveillance data and case information used in their own cases.

Heterogeneous IT environments: The basic technical environment differs widely between public defense offices at different jurisdictions and in different regions. Most of the Bay Area public defenders we talked to had laptops, phones, wifi, and VPNS. In other areas, public defenders relied on landlines and aging desktops, and were forced to user their personal data plans for communicating with clients and downloading data.

How to keep prosecutors out: The judicial system is adversarial, and public defenders are wary of technologies, particularly evidence processing or central repositories of information which may be accessed by prosecutors.

Lack of control over data formats: Public defenders have little control over the format digital evidence arrives in, and it is subject to change.

Explainability: In order to present findings in court, the mechanism to arrive at those findings must be easily explainable to a judge.

Privacy of Client’s Community: In exonerating a client, surveillance data processing tools may implicate members of their community. Designers should think carefully about who else’s data besides the defendant’s may be intermixed in discovery.

Bias: The consequences of this work are high, and tools which work well only for a subset of defendants (e.g., English speakers) risk exacerbating inequality in the system further.

Policy Implications

Big Picture Reforms

Reduce caseloads by diverting cases away from the criminal justice system and increasing public defense budgets.

Reforms to Improve Transparency in Surveillance Data Acquisition

Advocate for ordinances to provide transparency and regulation around local acquisition of surveillance data systems. [1][2][3] Two important models are Oakland’s PAC Surveillance Technology Ordinance, which requires disclosure of new technologies and prohibits not disclosure agreements, and Seattle’s 2013 ordinance, which requires city council approval for acquisition of new technologies
Extend such ordinances to include disclosure rules for surveillance in jails, prisons, and public housing and explicitly include mechanisms for public defenders to receive the same access as law enforcement [4, 5]
Include explicit requirements for PD engagement in the acquisition of surveillance data systems [6]
Consider exceptions to trade secrets and copyright laws to require defense council have access to information about technologies used to process evidence used in criminal courts [3][5]

Specific Reforms to Improve PDs’ Ability to Process Surveillance Data

Advocate for staff at major tech companies to specifically respond to requests from public defenders
Consider amending Brady laws to require disclosure of evidence and required technology in a format that is consumable to defense council and technology required to process data (e.g., if Cellebrite evidence is used, PDs must have software to view it [2])
Amend existing privacy legislation to enable public defenders the same exemptions as law enforcement when getting access to stored communications [6]
When possible, reduce exemptions for law enforcement for new and existing privacy-protecting legislation.

It’s not privacy laws or technical hurdles [which make social media data hard to get], it’s Facebook and Google being dicks. They will give law enforcement stuff without a warrant, but they just won’t respond to our subpoenas very often!

– Felony Public Defender

References

Mulligan, D. K., & Bamberger, K. A. (2018). Saving Governance-By-Design. California Law Review, 106(3). https://doi.org/10.15779/Z38QN5ZB5H
Greene, D., & Patterson, G. (2018). Can we trust computer with body-cam video? Police departments are being led to believe AI will help, but they should be wary. IEEE Spectrum, 55(12), 36–48. https://doi.org/10.1109/MSPEC.2018.8544982
Owens, K., Cobb, C., & Cranor, L. F. (2021). “You Gotta Watch What You Say’’: Surveillance of Communication with Incarcerated People. 18.
Joh, E. E. (2017). The Undue Influence of Surveillance Technology Companies on Policing. New University Law Review, 92. https://doi.org/10.2139/ssrn.2924620
Wexler, R. (2017). Life, Liberty, and Trade Secrets: Intellectual Property in the Criminal Justice System. Stanford Law Review, 70. https://doi.org/10.2139/ssrn.2920883
Wexler, R. (2019). Privacy Asymmetries: Access to Data in Criminal Investigations (SSRN Scholarly Paper ID 3428607). Social Science Research Network. https://doi.org/10.2139/ssrn.3428607

Working With Novel Surveillance Data

Storyboard

Issues & Needs

STEP

ISSUES

NEEDS

Solutions

Constraints

Policy Implications

Database Management

Information Sharing