AI-powered tools such as ChatGPT have attracted considerable interest in the software development community. These tools have many applications, including to the security of Web3 platforms and smart contracts. We have already discussed their potential to improve the auditing process by summarizing program functions and helping to find code vulnerabilities.
ChatGPT and similar models use deep learning, which is a subset of machine learning, as their underlying technology. Deep learning has been extensively researched as a means of improving software security over the past decade. While using a chatbot like ChatGPT to ask questions about code is one approach to enhancing software security, there are many other exciting ways in which deep learning can be applied. Research into improving software security through deep learning is in its infancy, and ideas are just beginning to reach the implementation and testing stages. In this article, we present a review of the current state-of-the-art in the application of deep learning to secure software.
Deep learning is a type of machine learning that involves training a system to recognize patterns in data. The system is provided with a set of inputs and their corresponding classifications, and it learns to identify patterns that differentiate one input from another. Once trained, the system can be used to classify data inputs it hasn't seen before. This allows for the creation of powerful models that can recognize complex patterns in data, such as locating bugs or understanding programs.
Traditional Software Security Techniques
Vulnerable code can be identified using heuristics, static analysis, formal verification, and fuzzing, among others, which are some of the current approaches employed to help secure programs.
Heuristics are methods for identifying insecure coding patterns based on rules of thumb or experience-based techniques. Common heuristics include avoiding the use of unsafe functions, properly handling user input, avoiding the use of hard-coded passwords, and validating all inputs before processing them.
Code bugs can also be detected using tools that conduct static code analysis both at the source code and bytecode levels. These tools have the ability to pinpoint vulnerable code patterns and create visual representations to aid in the understanding of smart contracts. Over time, the accuracy and effectiveness of the tools will increase as the database of discovered issues grows.
In addition to static analysis, CertiK performs formal verification of code to ensure it meets its requirements. Formal verification is a mathematical process used to prove that a program functions as anticipated by expressing program properties and expected behavior as mathematical formulas and using automated tools to verify them.
Fuzzing, also known as fuzz testing, is a technique used for software testing that involves injecting invalid, uncommon, or arbitrary data into a program. The software is then observed for any crashes, failures in built-in assertions, or potential vulnerabilities that may arise as a result of this input.
How Deep Learning Can Enhance Traditional Software Security
Despite their strengths, current vulnerability identification techniques often encounter challenges out in the wild, especially when performing large-scale smart contract audits. For example, it is not feasible to use manual inspection to classify the bugs in the CertiK database of manual audit reports with over 60,000 findings. Aside from the sheer number of bugs, the diversity of natural language used to describe them renders traditional phrase matching ineffective.
As another instance of a scaling challenge, static tools generate findings that must be manually verified by security engineers to eliminate false positives and improve the quality of results. False positives occur when a static tool identifies a benign or non-critical issue as a bug. Often, false positives cause developers to waste time and effort investigating and fixing issues that do not exist. In some cases, false positives can also lead to legitimate bugs being overlooked, as developers may become desensitized to the large number of false positives generated by the tool.
Large projects also pose challenges to formal verification. There are usually white papers and design documents in English describing what smart contracts are supposed to do. Extracting mathematical rules from the white papers, design documents, and programs is a difficult and time-consuming process. Additionally, the complexity of smart contracts can also result in a large number of possible execution paths. With larger systems, fuzzing all possible paths becomes impossible.
Due to the challenges of current techniques, it is essential to find new and more efficient ways to improve smart contract and program security. Deep learning can alleviate the challenges associated with existing vulnerability detection techniques. Deep learning has been applied to vulnerability classification, specification inference, automated bug explanation, predicting program properties, reducing the number of false positives reported by traditional tools, and enhancing fuzzing. We will now examine how deep learning can be used to improve some existing techniques.
Code classification aims to classify code fragments (with or without their documentation) into a variety of classes. This classification can be a multi-class classification over the type of the bugs present in a piece of code (e.g., coding style, logical issue, mathematical issue, etc.) or the functionality of it (e.g., sorting algorithms, sending funds, etc.). There is an abundance of research in the area of code classification using deep learning which can greatly help navigate and analyze large databases of bug findings.
In code clone detection, identical or similar pieces of source code or bytecode, known as clones, are found within or between software projects. Clone matches can help security engineers avoid missing previous findings when examining similar projects and functions. To achieve such a capability, for a pair of code fragments, a similarity score is measured; if the score exceeds a certain threshold, the pair is considered similar. Syntactic sugar and the diversity of coding styles can create problems for traditional (or rule-based) approaches to computing similarity scores. Nevertheless, a deep learning model can consider the complex semantics of code snippets or binaries to compute scores. There is even the possibility of detecting code matches between different programming languages.
An alternative or complementary approach to bug detection using static analysis is neural bug finding: finding bugs using deep learning techniques. In general, bug detection can be seen as a classification problem that can be addressed with deep neural networks trained on examples of buggy and non-buggy code. Researchers have proposed various instances of such a technology, including frameworks such as CodeTrek. CodeTrek uses a combination of traditional static analysis and relational databases integrated into deep learning to identify bugs in large, real-world projects. It is interesting to note that there are some bugs that traditional logic-based analyzers are fundamentally unable to detect. For instance, variable misuse bugs cannot be defined using logic rules. Therefore, even the most powerful static analysis framework cannot find them whereas various neural architectures have shown remarkable promise in spotting such bugs. It is especially useful since new types of attacks emerge frequently, and traditional rule-based approaches may not be able to keep up.
A significant challenge in vulnerability detection using deep learning is the lack of high-quality real-world data generated by human experts. In fact, some vulnerability detection models are trained using datasets that do not necessarily portray the real-world state of the vulnerabilities. CertiK's large database of previously-identified bugs provides a promising dataset for developing a learning-based vulnerability detection tool.
In software development, prioritizing bugs helps code owners focus their resources on fixing the most critical bugs first. Deep learning can be used to assist in prioritizing bugs by analyzing bug reports and identifying patterns that indicate the severity of the issue. Such a model can be trained on a dataset of bug reports that have already been ranked, and it can learn to recognize patterns in the data that are associated with high-priority bugs. To ensure that the model is accurate and effective, it is imperative to use high-quality data for training and testing. This includes ensuring that the bug reports are labeled correctly and that the dataset includes a representative sample of bugs across a range of severity levels. At CertiK, security engineers assign severity scores to bug findings after manual inspection. These scores act as a ranking scheme. Such collection of high-quality data is invaluable for training an effective model.
Deep learning can be used to explain bugs in software by analyzing log files and identifying patterns that indicate the root cause of the issue. This approach can help developers diagnose and fix bugs more quickly and accurately, saving time and improving the quality of software. A deep learning-based model can generate natural language explanations for software bugs by learning from a large corpus of bug fixes. To generate a natural language explanation for a bug, a model must leverage structural and semantic information about the program and bug patterns. This can be particularly useful in large and complex software systems, where it can be difficult and time-consuming to manually diagnose and fix every bug. Additionally, a bug-finding model that provides reasonable bug explanations provides the transparency needed to build trust in the community of security engineers and security experts to rely on in real-world and high-stake conditions.
Reducing False Positives
False positives are a common challenge for bug finding tools. As we mentioned earlier, false positives occur when a static tool identifies a benign or non-critical issue as a bug. Deep learning can be used to help reduce the false positive rate in these tools. To determine whether a new error report will also be a false positive, a deep learning-based model can be used to discover the program structures that provide false alarms in a static tool. Deep learning techniques can be applied to discover such correlations and to build a true-positive versus false-positive classifier.
The challenging step in applying formal verification is extracting the intended behavior of software in the form of logical/mathematical formulas using its source code and/or documentation. Automatically inferring this behavior, which is known as specification inference, can remedy this challenge. Researchers have proposed various approaches to specification inference, such as inferring loop invariants. In one study, the authors presented a method for automatically determining assertions for specific program points using program execution traces. The approach is based on the use of deep neural networks and has shown promising results in detecting real-world software errors. Another study by Prof. Ronghui Gui, CertiK’s co-founder, and other researchers at Columbia University proposes a deep neural network approach to infer loop invariants, which, again, models loop behavior from program execution traces. The authors showed that their method outperforms state-of-the-art techniques in terms of accuracy and efficiency. Additionally, some studies have focused on learning nonlinear loop invariants, which can be challenging to infer using traditional methods.
Deep learning can be used to improve program fuzzing by leveraging its ability to learn patterns and extract features from large amounts of data. Deep learning is able to enhance fuzzing by guiding the generation of inputs towards more interesting instances that are more likely to trigger crashes. Models can be trained to learn which input parameters or sequences of inputs are more likely to cause a crash or trigger a buffer overflow. This can potentially improve the coverage of the fuzzer by exercising different parts of the program. This can help discover bugs or vulnerabilities that might have been missed by traditional fuzzing techniques.
The application of deep learning to improve software security has great potential. The technology has already demonstrated its efficacy in several areas, including vulnerability detection, code classification, clone detection, and more. Traditional software security techniques, such as heuristics, static analysis, formal verification, and fuzzing, have their strengths but also face scaling challenges. Deep learning can provide more efficient and effective ways to address these challenges and enhance the security of smart contracts and programs. Although research into deep learning's application to software security is still in its infancy, there are exciting opportunities for continued innovation and development in this area. As the industry continues to grow and expand, the integration of software security and deep learning technologies will play an increasingly pivotal role. This will enable us to improve the security and reliability of software systems.