Find Experts

The Importance of Source Code Analysis for Investigations (Part 1)

Originally pushed for Legaltech News, an ALM publication October 28, 2015 by Joe Sremack, an ALM Listing Expert

Source code analysis can provide critical insights needed to solve an investigation and answer key questions about how events occurred.
  Source code analysis is a powerful tool that can answer questions that traditional investigative methods such as document review and data analysis cannot. Traditional methods answer questions about the whowhatwherewhen, and why of a matter, but may not fully answer how certain events occurred. Source code can be found within any organization, and many organizations are increasingly reliant on creating and customizing their own software. Source code analysis can provide critical insights needed to solve an investigation and answer key questions about how events occurred. In Part 1 of this two-part series, Joe Sremack discusses the role of source code analysis for investigations.

What is source code?

Source code is a set of computer instructions written in a human-readable form. It is a set of text-based instructions written in a programming language, compiled or interpreted to perform one or more tasks, and the source code statements follow the programming language’s syntax and semantics rules. There are hundreds of known programming languages—thousands if you count obscure and task-specific languages—used for different purposes and with their own syntaxes and semantics. Once source code is written, it can be executed either by being compiled into an executable program or at runtime by an interpreter that translates the code into computer operations. The format of source code depends on the language and Integrated Development Environment (IDE) used. Some source code is simply one or more text files. This is commonly the case for scripting languages, such as Python and Ruby. Other source code can combine text files and non-text file objects—such as pre-compiled libraries, GUI design files, and system configuration files. Compiled languages often have these non-text objects, which are combined in an IDE. Analyzing the text file–only source code can be accomplished with any text editor, but the non-text file objects may require specialized software to view. Source code is as varied as the different types of software. Source code can be written for mainframe computers, personal desktop and laptops, servers, virtual environments, websites, business intelligence platforms, data transfer processes, data-centric mobile applications, and so on. Each environment can have a host of different types of software created for it—each with different programming languages. The source code for each can be analyzed to answer questions about how the software operated and what was performed. Source code can be created by various people in different roles. Because code comes in many different forms, it is not only created by software developers and specialized programmers. While highly specialized, complex software may only be created by programmers, other types of source code can be created by people in different roles. Database queries, small scripts, and batch programs can all be created with relatively little programming knowledge or experience. This is important for investigations, because the investigator needs to consider who could potentially write source code and the types of programs that could be written. For example, an employee in a company’s payroll department may create logic using Excel VBA to generate ghost employee records that could be critical to the investigation.

Why analyze source code?

Source code is valuable for investigations for a number of reasons. First, source code contains information about the logic and business rules used to perform various operations. The operations of an organization may be described or documented, but those may not match the actual operations. Source code can be used to reveal the actual operations. For example, a healthcare company may claim that it does not modify certain types of medical records. If it relies on custom software for its medical record processing, that claim can be tested by reviewing the medical record processing software’s source code. Second, the source code for key business operations contains information about the location and nature of the data used for specific operations. In an adversarial investigation, an investigator can locate key data repositories via the source code, rather than simply relying on potentially deceptive interview subjects. This enables an investigator to identify key data sources more completely and effectively. Third, source code can be used to aid the data analysis process. The investigator can use the logic from the source code to determine the types of data to analyze and uncover relationships between various data sets. These insights can be used to understand business rules and help identify critical elements in the data that might otherwise go unnoticed. Fourth, source code can be analyzed in relation to the data to identify discrepancies. Source code analysis can yield insights into the business rules for how the data should be stored. If an investigator is confident that data should not have been modified by anything except for that program, the data can be tested in relation to the business rules in the source code to identify anomalies. These anomalies, in turn, may point to non-standard or fraudulent activity performed outside of the business rules. Other examples of goals for source code analysis include:
  • Analyzing similarities and differences between two sets of source code as part of an intellectual property dispute
  • Analyzing how a program’s behavior evolved over time
  • Locating security flaws


Investigators should consider source code when conducting investigations. Numerous forms of source code can exist, and since many organizations have customized software that performs business operations, the source code may be a valuable source of information. Source code analysis can help validate data analysis, identify data sources, pinpoint data anomalies and fraudulent activity, or highlight how a data breach occurred. Without source code analysis, the investigator may not have a full understanding of what actually happened. Original Source:   Part 2 of the series, covering the types of source code analysis that can be performed and how you can integrate source code analysis into an investigation, can be found on

Love Print

Expert Witness Directory: Print Version

Love Print? The entire experts directory is also available in Print
Teen passenger paralyzed in crash
Build your case with confidence: research 200,000+ verdicts and settlements.