Legal Data Intelligence – Initial Thoughts

By Tristan Jenkinson

Introduction

Since it debuted at the CLOC Global Institute earlier this month, there has been a lot of talk about the new Legal Data Intelligence framework, and rightly so. The new framework has been envisaged as a new approach to provide “a vocabulary, framework, and best practices to manage legal data”. You can see the main Legal Data Intelligence site here. One of the areas that has caught most attention are the different models that have been put together, breaking down steps involved in some of the most common uses of legal data analysis.

Intelligence Led

For over a decade I have been advocating for what I describe as “an intelligence led” approach to eDiscovery and investigations, using data to enable strategic decision making. So many details of the new frameworks really resonated with me. The team behind the new framework put together a whitepaper which talks about the rise of legal data intelligence practitioners, the bullet points of which very much align with work that I (and others) have been doing in the eDiscovery and investigation spaces for some time:

  • Applying their skills in managing large volumes of data to a broad range of disparate legal challenges
  • Effectively organizing and aligning people, processes, and technology to meet time-sensitive deadlines
  • Defining and demonstrating success in a quantitatively rigorous way
  • Finding ways to accelerate business while still mitigating legal risk
  • Building defensible processes that lawyers can rely on
  • Being an emissary with a vision not just a functionary

As noted by Joy Holley writing on eDiscovery Today, this approach is not presented as a fait accompli, but as a framework to be further developed:

 “I understand the feedback some have shared that the model is not fully baked; to practitioners at the progressive organizations that tend to gravitate to events like CGI, the website’s content may seem like scaffolding without much substance. In this time of rapidly evolving innovation, however, I respect the committee’s willingness to release what they readily acknowledge is a work-in-progress, and to solicit input from other industry professionals. Those of us who focus our practices in these areas need to lean in to build out standards.”

There is definitely some great content here, but it is also good to see the willingness to build and develop. I have some thoughts on areas that I would like to see further discussed, which admittedly are mostly towards the data collection side due to my personal experience.

Left v Right Hand Side Dominance?

As a side note, I do sometimes feel that the eDiscovery community is predominantly right handed, i.e. with a focus on the right hand side of the EDRM. With my experience on the forensics side, my viewpoint is somewhat biased toward the left hand side. Ultimately regardless of how fantastic the technology you can implement within a review platform, if you have not collected the correct data, or not collected the data correctly, this can have a hugely negative influence on your matter. Rubbish in means rubbish out, regardless of how good the process may be.

A More Forensic Focus

One thing that I was a little disappointed to see was the “Collect Data” section, especially for litigation. A relatively simply description of “Gather data from identified sources” could cause some potential problems down the line. I am currently writing an article regarding the potential risks of self-collections and the importance of proper data preservation and documentation. It would be good to see this area fleshed out with some more focus on ensuring that this collection step is performed such that the data can be successfully used in court and does not create potential legal issues for the matter in the future. It may also be helpful to acknowledge possible complications in some approaches to data collection here, to mitigate the risk of parties damaging their case at the very beginning.

Another disappointment in the “Collect Data” area was the note under the use of technology that states that the technology “Pulls data from sources into a platform for processing”. Unfortunately (and I admit I do have a bias in this area) this is somewhat overly simplistic, and certainly not always the case. In many cases, data has to be collected from devices, to then be exported (potentially in a different format) for processing in a platform. Yes, there are review platforms that can pull data from sources, but there are potential risks with this approach.

I would love to see this area in particular built upon, with an outline for a more forensic, defensible approach, especially as “building defensible processes that lawyers can rely on” is one of the attributes listed for Legal Data Intelligence practitioners (rant over!).

Potential Inconsistencies

I did note some potential inconsistencies, such as the inclusion of legal hold notices in internal investigations, but not for litigation, and a slightly more fleshed out Collect Data section in investigations compared to litigation. Not a big deal, and this may be the compromise to having different models for different use cases, but something to consider.

Separately, if legal hold notices are included, it would be great to also see something on monitoring legal hold compliance. Unfortunately just issuing legal hold notices is not enough in many cases, and a note on potential tipping off issues if legal hold notices are provided may also be worth including here.

Data v Documents

It is good to see a focus here on data, not documents. There is also (a warmly welcomed) specific mention of structured data, in addition to the unstructured data that can typically be the focus of an eDiscovery type matter.

Historically there has often been a focus on email data and typical documents (often Microsoft Office files and PDFs). This can often mean missing key information which increasingly falls in other data sources, such as those mentioned explicitly by the Legal Data Intelligence team (including, for example WhatsApp, Slack, Teams etc.).

There has been more movement to consider more disparate data sources in recent years, but this is still potentially an area missed in many cases where cookie cutter/sausage factory approaches to eDiscovery focus on more “traditional” data sources. I do feel that increasingly more data sources need to be considered when matters are being scoped, and these will often differ depending on the details of the matter at hand.

I would like to see more use of data being used strategically, linked to my “intelligence led approach”. For example, this allows for the use of traditionally digital forensic approaches being implemented to support eDiscovery matters. This could be used to identify additional data sources (as I believe is mentioned in the Legal Data Intelligence documentation), but there are other case examples as well, such as the recovery of deleted data in more investigatory matters.

One of the most common examples of using data for eDiscovery strategy that I have given is with regard to network access… Say for example a significant custodian in a major investigation has access to a 10TB network, and is not providing any information about where they stored (or may have stored) relevant information. Using digital forensic investigation techniques, it is possible to identify the folders across the network that the user has traversed. This information can be used to prioritise these areas of the network. There are obviously caveats in taking this sort of approach, but is a method that can be hugely beneficial.

The Dangerous “Linear” Assumption

The Legal Data Intelligence models are broken into three sections, Initiate, Investigate and Implement, each of which contains a number of steps. There is a danger that without detailing it, some may assume that all these steps (and sections) are linear, with one following the other in sequence. In my experience, it is unusual for this to be the case. In many cases new data to collect will be identified or additional custodians may be discovered during the review, or you may have priority custodians whose data needs to be reviewed for specific information or issues, which has to be produced immediately, prior to any next steps. Most cases are iterative in nature, some extremely so.

That is not so say that the team intend the models to be interpreted as being linear, and I am certain that this is not the case. It is also worth noting that I have repeatedly seen the EDRM model being misinterpreted in this way.

The assumption may be that the teams who will be referring to the models would take it as read that the models are not designed to be purely linear, but this can be a dangerous assumption to make.

SUN and ROT

The framework uses the terms SUN and ROT for categories of data. The existing information governance term ROT is used, which refers to data which is Redundant, Obsolete or Trivial, and introduces the (somewhat opposing) term SUN, for data which is Sensitive, Useful or Necessary, and explaining:

 “… the data challenge that is at the heart of each of these use cases: finding and acting on the sensitive, useful, and necessary (SUN) data within all the redundant, obsolete, and trivial (ROT) data that constitutes the bulk of data in every organization.”

I was initially a little concerned about the SUN term in terms of this approach, since Sensitive data may be data that you would need to look to exclude, potentially even at the collection stage, especially when GDPR data minimisation would apply.

However, it is clear that the team is on top of this, having included that SUN is designed to include data that is Sensitive, Useful, or Necessary, or a combination of at least two of the three. It’s data that has a higher likelihood of being relevant to the legal task at hand. What qualifies as SUN data can change depending on the nature of the task.

It was reassuring to know that this was already built in, even if it means that my initial section header “Too much SUN can be bad for you” is no longer applicable!

Conclusion

The Legal Data Intelligence has been welcomed by the community, and I see it as a very advantageous addition to the legal technologist’s arsenal. I would love to see further development, especially on the forensic side.

Time will tell if this is truly a new paradigm in eDiscovery, but it is certainly a helpful, and I think necessary step to build towards a better future in the legal space.

2 thoughts on “Legal Data Intelligence – Initial Thoughts

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.