By Dr Tristan Jenkinson

Introduction

I haven’t posted recently due to work commitments and other personal projects. I have been working on a few things (including taking a course on the Ethics of AI), and I have a whole list of blog posts that I still want to get around to writing (neither the “Importance of data that doesn’t exist” or the ChatGPT series are done yet!). One thing that I had been working on was a webinar for EDRM on Generative AI. It covers some content that I mention in the ChatGPT series, but also has some other bits and pieces in. If you want to take a listen you can find it here.

I’ve also been working on a separate article on self-collection that is taking a while to finish, so keep an eye out for that in the future.

With Halloween looming, and having failed to write post much recently, I wanted to put together a collection of a few horror stories that I have seen and heard throughout my career so far. This is to highlight mistakes that have been made in the past, so that others can avoid those particular pitfalls in their own work. For that reason, in most of these cases, I’m not providing specifics or links as I would normally.

The Case of the Mangled Metadata

In this case when preparing data to be provided to a regulator, the legal team took steps which unfortunately resulted in some metadata for a key document being removed, and new dates and times being inserted, leading to concerns that the document might have been manipulated.

Rather than creating a production, or exporting from the eDiscovery platform in use, the legal team downloaded a native copy of an email to be produced. This “production” email appeared to have then been sent via email within the law firm, before being prepared for production to the regulator.

Unfortunately, it appears that a metadata scrubber (a tool used to remove metadata from outgoing emails and attachments) was in place on their email system. It is not uncommon for legal teams to utilise such systems, and there are various software solutions for this. For example Litera Metadact (https://www.litera.com/store/metadact/). A metadata scrubber would have looked at the attached “production” email and identified that it contained a Microsoft Word attachment. The scrubbing system then would have removed metadata from that document, in this case apparently updating the Created and Last Modified values in the process.

The result was that when the email was produced to the regulators, internal metadata of the Microsoft Word attachment had been removed, and the Created and Last Modified dates in the Word Document had been updated to shortly before the production, years later than the email which the document was attached to was sent. This naturally lead to concerns that the document could have been manipulated.

This was not the only issue, however. The “production” email itself was left with fingerprints from the law firm on it – including personal information and email addresses, resulting from the “production” email being sent as an attachment through their email system.

Thankfully it was possible for the original email and attachment to be reprovided from the eDiscovery system without any undue alterations or fingerprints added.

This should however, serve as a reminder that data, whether for litigation or otherwise, needs to be handled with care.

I have written about another similar example of a legal team’s metadata scrubbing tools causing problems in litigation previously, which you can read here.

Near-Deduplication Nightmares

This horror story relates to a case that I didn’t work on, but was made aware of. This case raised concerns with misunderstanding how and when certain areas of eDiscovery technology can and should be used.

Disclosure was received from the opposing side. It was noted when it was received that families had been split, which was unusual. More concerning was that an email relevant to the matter which was provided, should have included an attachment, but this attachment was not included in the production.

The missing attachment was queried and the response was a surprising one. The legal team explained that when they raised the query with their eDiscovery provider, they had stated that the attachment was not provided because it was a near duplicate of the parent email.

This immediately raised a number of red flags:

Near deduplication is not typically used as an automated method to exclude files from production.
It would be (relatively) unusual for an email attachment be a near duplicate of the email it is attached to.
If near duplication was being used to exclude files, this means that files similar to those that have been deemed relevant, have been excluded.
It is unusual for families to be split, typically if the email is considered relevant, then its attachments would be provided.

Near deduplication usually works by comparing the textual content of files, calculating a percentage of similarity and grouping files which are statistically similar (i.e. above some specified value).

It would therefore seem strange for an email attachment to be a near duplicate of its parent email, as it would need to contain mostly the same text as the parent email. This would suggest that the attachment was likely an email containing much of the same history as the parent email. In turn this suggests that if the parent email is relevant, then the attachment would likely be relevant as well.

Regardless of the attachment itself, the concept of using near deduplication as an automated method to exclude files from production causes concerns and it seems a highly dubious use of the technology.

For example, this would likely mean that if there was a long thread of emails discussing how to defraud a competitor, you could have a key custodian replying saying “This is illegal, but I think we can get away with it, let’s do it”. If other emails from the thread are produced, then this email may not have been provided on the basis of it being a near duplicate of other documents being produced.

Near deduplication is more typically used to group files together for review, to identify groups of data that are likely not relevant, or groups of data that are relevant.

It is possible that this explanation from the eDiscovery provider was the result of some misunderstanding between the opposing law firm and their eDiscovery provider. For example, it could be that the explanation was that they had run deduplication at an item level, and the attachment had been deduplicated against a copy of the file elsewhere in the production.

This explanation would still not be without its problems. Item-based deduplication means that every file, even attachments, are included in the review site just once. This was once more commonly seen (even if it was not popular) but is unusual in current eDiscovery cases. We now typically see family level deduplication, meaning that where an email is included its attachments are included with it, even if it is also supplied elsewhere.

There are several reasons why family level deduplication is typically preferable. A lot of this comes down to the idea of context. If an email is relevant, then its attachments may be relevant, either directly, or because they provide more context to the email itself.

There are also risks with splitting families. If you say that an attachment has not been produced, because it is a duplicate of another file, then you should ensure that it has been included in the production. If that file is located elsewhere (say under a different custodian’s email), then there is a risk that it may not be included in the review, and could be missed in the production.

In addition, you also have to make sure that the file can be identified in the production. If you read the email and then want to see the attachment, there would need to be a mechanism for identifying the correct file. If the file is supplied separately, there needs to be an indication of where in the production it is contained – for example, by listing the hash of the attachment, and using this to link to a file. In my experience, this link was typically not included in productions where item level deduplication had been applied.

As a final note, the potential misunderstanding between eDiscovery provider and law firm raises another point, the importance of clear communication. There are many more horror stories that stem from this simple issue, and its importance as a lesson cannot be underscored.

A Frightful Forensic Fiasco

This is a case that is currently in court which I am aware of, but have no involvement with. It highlights the importance of performing forensic collections correctly and in accordance with instructions and agreed protocols, as well considerations regarding best practice.

In this matter, it is alleged that an eDiscovery provider reached an agreement on a collection protocol with a custodian regarding the collection of their email data. That agreement stipulated that a date range and a series of search terms would be applied “to the account”, and were to be “applied to the email address fields only”. Further, it was made clear in email that “Only the results of that search are collected”.

The eDiscovery provider, however, collected all of the custodian’s emails in the date range. This meant that over 34,000 personal emails were included in the data preserved by the service provider.

Emails which have been filed in relation to the case suggest that the decision was made unilaterally by the service provider… an email from the forensic analyst performing the work said that:

“There were numerous attempts to apply the search terms at the point of collection; however, we found that the best practice was to apply only the date range”.

This sounds a little strange, as the start of the sentence suggests that the attempts to apply the search terms ultimately failed, but that doesn’t appear to be what has happened. Surely if this was the case, they would have said that it was not possible. Instead, the analyst switched to saying that they changed approach because of “best practice”.

While I would agree that in general best practice would be not to apply search terms within an email system at the point of collection, typically you would suggest collecting the full mailbox (with date range in place) and search in a dedicated tool, forensic collections should only be carried out in line with instructions and agreed protocols – even if that means going against best practices. Based on the information in the filings, the forensic analyst performing the collection raised the possibility of collecting the full account and applying the search terms later, with the custodian prior to the collection. Again, based on the filings, it was made clear by the custodian (in email) that the search was to be performed at the point of collection.

Unfortunately, that is not what happened, with the forensic analyst apparently deciding themselves that they would collect the full data set, without knowledge or consent of the custodian, and with full awareness of the agreed protocol. As a result, the forensic analyst, and the eDiscovery service provider are being sued.

This really highlights the importance of ensuring that your forensic staff understand their legal obligations when it comes to data collection. Forensic collection has many complex facets which are not always appreciated. Without a proper understanding and consideration of forensic and legal principles, data collection can carry a large number of legal risks and pitfalls. This is not just in relation to potential over-collection, but could be in regard to cross border movement of data and related data privacy issues, or issues stemming from legal access and computer misuse. This is in addition to understanding the best practice and approaches for data collection from an ever expanding array of potential data sources. The benefits of a properly trained, knowledgable digital forensic team, who understand these issues and can ensure that data collections are performed correctly, cannot be understated. Professional digital forensic teams rarely get the credit they deserve.

It is also worth reiterating that while digital forensic and eDiscovery teams can advise their clients and related parties on best practices, ultimately they can (or at least should) only act on their instructions. This is the case even if the client does not want to act in line with best practices (so long as the instructions are legal). This can become a complex topic, and may be something that I look to discuss further in another post, as there is far more to be said.

A Horrific Historic Howler

I wanted to include this as it’s a case that I originally intended to write about many years ago, having heard about it through an excellent article on Civil Litigation Brief (referenced below. This is again a matter in which I have (thankfully) had no involvement. It is one of my favourite examples of how not to approach disclosure, and is a really interesting example of what digital forensic teams have to deal with when working on site.

The case is, at a high level, a dispute between Taxi firms, which has a history relating to the defendants repeated failures to provide disclosure. The legal case dates back to 2016-2017, but it remains a good example of how not to approach disclosure. Ultimately the entire defence and counterclaim was struck out, and two of the defendants were debarred from defending the claims at all.

There is a great summary of the disclosure issues in Civil Litigation Brief which you can read here, though if you want to read the full judgment, you can find that here. There was also an interesting follow up, considering what the defendants would be able to do at trial, if their defence was struck out (in short, not much) – you can read that article on Civil Litigation Brief here.

Before we get to the really interesting part, about what happened when the digital forensic team arrived to image the defendant’s machines, it is helpful to consider some of the history of disclosure that led to that point.

The disclosure history begins with an injunction (in November 2014) that states that the defendants will preserve “all electronic files and associated computer hardware within their possession or control…”

The defendants later, in March 2015, confirmed, through their solicitors that:

“our client has retained all electronic files and data that was stored within the Second Defendant’s computer system and will be kept safe and will not be destroyed pending the finalisation of the trial or further Order”

A disclosure list was originally to be supplied on 17 June 2016, with inspection on 1 July 2016. The subsequent disclosure list from the defendants was for just 33 documents, and did not include much of what was expected, including documents previously disclosed. There was much missing, and consequently the claimants sought for and obtained an unless order (effectively and order that states you will face sanctions “unless” you do this), detailing various searches and disclosures to be made by the defendants.

In the literal final hour before these disclosures were due (at 4pm on 28 October 2016, the defendants sent out 16 emails to the claimant, stating they were compliant with the order. Hardcopy versions were provided on the 3 November 2016, together with a USB stick.

The USB stick was analysed by an IT consultant and (in the words of the judge) “had upon it no data whatsoever”.

The claimants wrote a letter to the defendants, laying out a detailed three page list of documents and categories of documents which were yet to be disclosed. The claimants subsequently issued an application for the defence and counterclaim to be struck out on 21 November 2016.

Prior to the application, on 19 November an order was made which required the defendants to give access to their systems, stating:

“[the defendants] shall by 4pm on 7 December 2016 permit the Claimant’s appointed IT and accountancy experts to access and inspect all and any computers and computer systems within their possession or control to search for and take copies of any documents… [r]elevant to the issues in the dispute”.

It is worth noting that the date on which the appointed IT experts were to collect the data, was just one week before a hearing on the case.

When the forensic team arrived to perform the collection, they were informed that there had been a cyber attack in May 2016 (i.e. after the warning on preserving data in November 2014 and the confirmations given in May 2015, and prior to disclosure being provided).

As a result, all of the original computers had been replaced with entirely new ones, the old hard drives had been removed, drilled through, and disposed of. No attempt to transfer or preserve data had been made and there were no backup tapes. The claimants were told that any data prior to May 2016 was irretrievably lost. Neither claimant, nor the court was aware of this until this point.

The judgment raises that the defendants lawyers were made aware of the cyber attack after the order, but before the forensic team arrived on site – allowing the inspection to proceed without informing the court or the claimants that it was likely to be a wasted and costly exercise.

There is, however, further intrigue regarding the “cyber attack”. David Bolton was a self-employed IT consultant who assisted the defendants business with software and computer systems. It transpires that Mr Bolton wrote an article about the work that he performed for a tech website.

The article states:

“The malware infected four PCs at the central office and two at satellite offices; the other six weren’t touched. The damage to these infected PCs was remarkably light: the log files (.log) were all encrypted, as well as one config file (.txt) that the server used for mapping East London into booking zones. After replacing that file, the server was able to run. The only loss was the log files.”

and

“The #Decrypt My Files.html contained a message asking for 1.2 Bitcoins (about $500) to recover the PC, including details on how to pay. No ransom was paid. The Taxi firm’s Managing Director already had a plan to replace all PCs in a few months, as most were six to eight years old. That plan was accelerated, and all 12 PCs were replaced one week after the initial infection. I returned a week later to help replace the PCs and to my surprise discovered that no further infections had occurred since the first one. It’s my belief that the malware just ran once from one PC and managed to infect five others. But it wasn’t permanent, and didn’t reload after a reboot, so the malware was gone.”

The article makes it clear that the impact was light, only impacting a few files, and did not infect several of the machines at all. In addition it had been possible to recover the machines, and resolve the issue, apparently entirely.

The article also makes plain that there were already plans to replace the machines. The judge highlights in his judgment:

“In other words the destruction of the existing computer system was due to a desire to update the system and not the malware”

Further stating

“I am in no doubt but that the information available on those computers could have been highly relevant to the Claimants’ case”

As above, the judge decided that the defendants were in material breach and struck out the defence and counterclaim. Two of the defendants were also debarred from defending the claims.

The case is a great example of how disclosure should not be performed. With many delays and failures throughout on the part of the defendants and their legal representation. While much of this sits on the legal side, it demonstrates that judges can and will stop parties from defending claims against them if they repeatedly act in breach of their disclosure obligations.

eDiscovery Horror Stories 2023

Introduction

The Case of the Mangled Metadata

Near-Deduplication Nightmares

A Frightful Forensic Fiasco

A Horrific Historic Howler

Published by Tristan Jenkinson

Leave a comment Cancel reply

Introduction

The Case of the Mangled Metadata

Near-Deduplication Nightmares

A Frightful Forensic Fiasco

A Horrific Historic Howler

Share this:

Related

Published by Tristan Jenkinson

Leave a comment Cancel reply