A few years ago, we published regular Thursday Theory columns (example) here on AiGameDev.com — helping cast a spotlight on research done in Artificial Intelligence for Games. Last year, Mike Cook started Saturday Papers with a similar mandate, and this year Graham Kendall embraced social media and setup the @TCIAIG Twitter account with a script that regularly posts titles of papers from the Journal.
After a short Twitter conversation with Graham on the topic [1, 2, 3] I got hooked on the idea of posting interesting highlights from papers, just like I had done manually on the @AiGameDev Twitter account — but fully automatically. This post will dig into how it works in more detail...
The following tweets were produced via a Python script, late during the development of PaperBot. There was some manual supervision of the process, but all this was posted from the script itself, and serves as a great example of what we'll be covering in the following sections:
The main idea of paper bot is to convert each page of a PDF into a 2D map of "interestingness" which is basically a greyscale image: black is completely boring and white is the most interesting. Here are some examples from the papers posted above:
Then a randomized search algorithm goes through the process of finding an optimal viewport in this map, constrained between 2:1 and 1:1 aspect ratio. It takes about 5,000 iterations to get a nice optimal viewport, using half the iterations as a global search and the other half as local hill climbing.
The challenging part is dealing with the wide variety of PDF files, along with different author styles and tools used. Tricks that work with some documents don't work with others. That was the main reason for using a multi-layered approach to computing interestingness maps...
1. Filtering Out Text
The most boring screenshots are those that contain text, so the first approach involves filtering out all the text with erosion. First, all the background is set to dark and the foreground to white, rendering at 300dpi. Then a 16px erosion filter is applied, which removes most letters from the paper and leaves interesting figures and images.
Left: The original render of one page of the PDF. Middle: Background pixels are marked as boring (black) and foreground marked as interesting (white). Right: Erosion filter applied by 16 pixels, removing small characters.
The only problem with this approach is that it filters out anything less than 16px, and many graphs or figures can be missed.
2. Identifying Figures
To isolate figures in the paper, edge detection can be used. There are a variety of solutions, but we used the ones built-in to scikit-image. They are pretty slow on images at 300dpi but work very reliably. The bot only posts every 11h so it's not a performance issue!
Figure: Finding edges using a Canny Edge Detector then a Probabilistic Hough Transform, setup as a sequence of operations.
In practice, we don't use all the lines identified, but only those of a specific length. At 300dpi, it's currently 50px.
Finding lines doesn't always work if there are many short segments, so a third pass is setup as an optional phase...
3. Attention To Color
A third approach is to hone in on colored areas, to avoid the black & white that's associated with text. Here, the bots computes the color of each pixel as the difference between all RGB components, and stores it in the map too.
Left: The original render of one page of another PDF. Right: Areas where the image is not greyscale, calculated as the sum of differences between all components.
The big issue with this approach is that some fonts render with colors in certain PDF files, due to sub-pixel smoothing or possibly kerning. This means the filter treats everything as interesting in rare cases, even text. This pass is only enabled if the other passes didn't find anything.
Conclusions & Open Issues
Overall, I'm quite impressed with the results of the script. It posts sensible images and has given me many insights into the papers I've seen so far! However, it also highlights that some papers are extremely boring and there's a wide range of accessibility standards when it comes to authoring papers.
Technically speaking, the script doesn't do a great job at capturing the areas around each figure, to help provide context when posted on social media. This is due to the 16px erosion and the fact that legends and labels will get filtered out, and are not taken into account by the optimization process.