Software bugs are a story as old as time – which in the case of programming means around 75 years. In 1947, programmer Grace Murray Hopper was working on a Mark II computer at Harvard University when she noticed a butterfly that was stuck in the relay, preventing the computer program from working. It was the first “bug”, and countless more have followed since.
In the history of programming, bugs have ranged from harmless to absolutely catastrophic. In 1986 and 1987, several patients were killed after a Therac-25 radiation therapy machine malfunctioned due to an inexperienced programmer error, and a software bug could also have triggered one of the biggest explosions ever. nuclear power in history, to a Soviet trans -The Siberian gas pipeline.
While such events are rare, it’s safe to say that software bugs can do a lot of damage and waste a lot of time (and resources). According to recent analysis, the average programmer produces 70 bugs for every 1,000 lines of code, with each bug taking 30 times longer to fix than it took to write the code in the first place. In the United States alone, approximately $ 113 billion is spent identifying and fixing code bugs.
That may soon change.
Microsoft recently announced the creation of a machine learning model that can accurately identify high-priority bugs 97% of the time. The model has an even higher success rate (99%) in distinguishing security bugs from non-security related bugs.
In a recent report, Scott Christiansen, senior security program manager at Microsoft, praised the algorithm, adding that Microsoft’s ultimate goal was to design a bug detection system “as close as possible” the precision of a security expert.
“We have found that by pairing machine learning models with security experts, we can dramatically improve the identification and classification of security bugs. “
The bug detection system uses two statistical techniques: the Inverse Document Frequency Algorithm (TF-IDF) examines the code for keywords and assesses their relevance, and the Logical Regression Model calculates the probability of existence of a specific class or event.
Then, the program classifies security and non-security related bugs and classifies them as “critical”, “important”, or “low impact”.
The algorithm is still under development, but Microsoft has announced that it will make its research open source on GitHub, which could end up saving coders around the world a lot of time and energy.
In the meantime, you can read a published academic article, Identifying Security Bug Reports Based Only on Report Titles and Noisy Data, for more details.
“Every day, software developers review a long list of features and bugs that need to be fixed,” Christiansen said. “Security professionals try to help by using automated tools to prioritize security bugs, but too often engineers waste time on false positives or miss a critical security vulnerability that has gone wrong. classified. To solve this problem, the data science and security teams came together to explore how machine learning could help.