Last year, Google released its robots.txt parser and matcher to the open source world. Since then, people have used it to build new tools, contribute to the open source library, and release new language versions (like golang and rust).
With the intern season ending at Google, they wanted to highlight two new releases related to robots.txt that were made possible by two interns working on the Search Open Sourcing team: Andreea Dutulescu and Ian Dolzhanskii.
First, they are releasing a testing framework for robots.txt parser developers, created by Andreea. The project provides a testing tool that can validate whether a robots.txt parser follows the Robots Exclusion Protocol, or to what extent. Currently there is no official and thorough way to assess the correctness of a parser, so Andreea built a tool that can be used to create parsers that are following the protocol.
Google has released a Java port of its popular C++ robots.txt parser. The parser is a 1-to-1 translation of the C++ parser in terms of functions and behavior, and it has been thoroughly tested for parity against a large corpora of robots.txt rules. Teams are already planning to use the Java robots.txt parser in Google production systems, and the company welcomes your contributions to these projects.
It was our genuine pleasure to host Andreea and Ian, and we're sad that their internship is ending. Their contributions help make the Internet a better place and we hope that we can welcome them back to Google in the future.