The Importance of Data that Doesn’t Exist – Part Two (Missing Data Sources)

By Dr Tristan Jenkinson

Introduction

In part one of this series of articles, I spoke about the importance of data that is not present, and the use of timelines in investigating where missing data may exist. In this article, I discuss data that may be missing (or may appear to be missing).

Those working in eDiscovery will be familiar with the situation when your client’s IT team have checked one of their systems, and reported that there is no data for one (or more) of your custodians. This is especially common where data sets being sought are historic, as the risk of data having been lost or deleted typically increases with the passage of time.

In such cases, it is important to understand why the data appears to not be present. In some cases it can be tempting to take the view that, if it isn’t there, then you don’t need to collect or review it. Taking such an approach can leave you open to questions from the other side further down the line about why the data was not found and reviewed – worse, the data might actually be present, it just is not where it was expected to be. In other cases, you may be aware that the data for that custodian will be key to your case, therefore understanding why it has not been found can assist in identifying where it might be located.

I mentioned in part one a matter I worked on where audio data sets were found to be missing, and upon investigation this was identified to be due to system malfunction. That case highlights one of reasons why it is key to understand missing data sets. While system malfunction may be one reason that data is lost, or no longer available, there are many more possibilities that should be considered when expected data sets are missing, or do not cover the period expected.

In this short article I discuss some common causes that lead to apparently missing data.

Past Email Systems

Historically, email systems (especially email servers based on premise) had far less storage space available than email systems which we see in use now. This led to IT teams managing those systems to introduce methods for conserving the limited space available. Two common approaches were to limit the size of users’ inboxes, and implementing restrictive retention policies to keep the email systems clear.

If historical email data is missing, then it could be due to such policies, and it is definitely worth looking into what the historic policies were to identify if these could explain that lack of data that can be identified.

Where inbox sizes were limited, users were often encouraged to create their own archives (to mitigate the risk of key information being lost). Locating these (if they exist) is a good option to consider, which can involve identifying where they may be recorded. It is also worth considering that archives may also have been created by IT (even perhaps in response to requests from users).  This can expand the locations where data may be stored. I have seen unofficial file shares set up by IT purely for storing user PST archives. When looking for email archives, network “Home drives” where used, and file shares are good locations to check, as are laptops and PCs. Digging further, it can be worth considering legacy devices and storage, or even considering backup data.

Another approach when facing missing email data is to consider other individuals who may have been included on communications that are of interest. This may require an expansion of the custodian list. Other individuals may also have email archives (or other storage) which have not been subject to the same retention policies. For example, staff at different levels, or working in a different department, may have had different retention policies applied to them.

Retention Policies (and Policy v Practice)

As touched upon above, retention policies can be a huge consideration when looking into missing data sets. If retention policies are set (and are executed) data could (and likely should) have been deleted due to the rules in place.

This can be a particular issue with email data (as discussed above), and is more of a risk when dealing with cases where the data required may be from some time ago.

Where data is missing due to retention policies, then alternative sources may be difficult to identify and will depend on the specific details of the type of data. Suggestions for email are included above, for other data sets, it will likely depend on the specifics of the data source under consideration.

A point that is worth noting when considering retention policies is the potential difference between policy and practice – it can be key to ensure that you identify any differences. I have encountered many cases in the past where questions to senior leadership about retention policies, will result in responses derived directly from policy, but the IT teams on the ground tell you that it isn’t practical, so actually they do something else. For example, the policy may be that all data stored on the servers are deleted once the project has been closed for 1 year. The IT team on the ground may explain that they have no visibility when projects are closed and so data is only really deleted from the server on an ad hoc basis – typically only once space is running low on the servers.

Such differences can mean that data is retained when the policy may have been for it to be deleted. In addition, it may not be in the original location. For example, where policy is that back up data is only retained for 1 year, whereas in reality there are backups covering 10 years sitting in the bottom of a cupboard in IT. This can especially be the case where there have been acquisitions, mergers etc in the forming of the company, meaning that policies may have changed over time.

For this reason it is always worth speaking to the IT team on the ground where possible to ascertain the differences between policy and practice (and to ask about the drives in the cupboard at the back of the IT room).

Where data is lost through retention policies, it can also be worth considering data from (or related to) legacy systems. This is discussed further below.

Username Changes

How usernames are used (and altered) can cause a number of issues with regard to identifying data, and can lead to data being missed.

Accounts may have originally been issued under a different name (for example a maiden name), and not changed, meaning that a search for the custodians current name may not identify any email data. Alternatively, a new email address may have been set up when the custodians name was changed, in which case older emails (under the original account) could potentially have been missed.

Usernames can be changed for a number of reasons. Where employees have had several different periods of employment, or moved from one department to another this can mean that they accumulate multiple different email addresses. Alternatively, where a company’s username format changes (due to expansion or as the result of a merger or acquisition for example) a custodian may have been given a new username. Care should be taken to ensure that all relevant email addresses for a custodian are collected from.

One method that can be used to identify that you may be missing data due to username changes (or retention policy related issues) is to look at employment history for each of the custodians, and check that collected email matches (and other data sources) match what is expected. This could be achieved in conjunction with the timelines covered in the first article in this series.

Ex-employee data

When custodians are ex-employees, tracing their data can be more complex as it may not be easy to identify. Many of the areas discussed above become factors.

You may also want to consider any retention policies relating to ex-employee devices and network data. The data of ex-employees may be supposed to be deleted shortly after they have left, to avoid data being unnecessary retained. However, the points above regarding policy v practice are also important to bear in mind. Policy may dictate that old laptops of users are to be wiped shortly after they leave. I have seen IT teams that have not considered this practical, and laptops have either simply been stored in a cupboard, or have been reissued to other employees. In both situations it has been possible to recover data relating to the original custodian.

Another area to look into for ex-employees are legacy systems.

Often when companies upgrade their systems (for example moving from on premise Exchange email systems to Microsoft’s O365 cloud hosted solution) only current employee data is moved over. This avoids the duplication of redundant data from old employees, and helps to minimise the data to be migrated. It does mean, however, that if you are looking for data relating to employees who left prior to the migration, it may not be present on the current systems.

In the past, I have worked on cases where no ex-employee data could be found, because only the current systems were being examined. When asking the right questions, it may be that the original on premise email servers are still around, meaning that the data could be extracted from them. Where the original servers are not retained, backup data from those original servers may potentially exist, for example, taken immediately before the migration in case of disaster.

Legacy Devices and Storage

Legacy systems are not just useful for ex-employees. Where data cannot be found for current employees, legacy systems could offer another opportunity to locate this data. This is especially the case where historic data being sought has been deleted due to retention policies on the new system, or even just deleted due to the passage of time. The data may still reside on the legacy systems.

It is also worth bearing in mind that it is not necessarily just the legacy systems themselves that are worth checking for. Backups of that data, even unofficial ones made immediately prior to migration can also be a good source of data. Alternatively if temporary copies of data were made in order to migrate over to a new system, sometimes these temporary copies can still be stored somewhere and can be a valuable source of data.

Conclusion

It can often appear that custodian data is not present. It is worth investigating further to understand the reasons that it cannot initially be located. With this information it can also be possible to locate the data, or explain why it is no longer retained.

An investigative approach to missing data can be key, especially where custodians are known to have had access to material that may be key to the case. Alternatively, where data cannot be found, even after investigating the issue, being able to explain to the court or opposing counsel why the data was not present, together with detailing the steps that were taken to investigate and recover data, is important in order to avoid potentially difficult questions later down the line.

Coming Up

In future articles I’ll look at some case studies centered around missing data, and also look at some anti-forensic considerations, as well as some implications for OSINT investigations.

One thought on “The Importance of Data that Doesn’t Exist – Part Two (Missing Data Sources)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.