Open Article
PaperBot

Paper Bot: Automated Highlights from a Database of 474 Papers

Alex J. Champandard on April 16, 2015

A few years ago, we published regular Thursday Theory columns (example) here on AiGameDev.com — helping cast a spotlight on research done in Artificial Intelligence for Games. Last year, Mike Cook started Saturday Papers with a similar mandate, and this year Graham Kendall embraced social media and setup the @TCIAIG Twitter account with a script that regularly posts titles of papers from the Journal.

After a short Twitter conversation with Graham on the topic [1, 2, 3] I got hooked on the idea of posting interesting highlights from papers, just like I had done manually on the @AiGameDev Twitter account — but fully automatically. This post will dig into how it works in more detail...

NOTE: If you're interested in Artificial Intelligence, why not join us at the nucl.ai Conference in Vienna on July 20-22. We're organising it for the 7th year running!

Some Examples

The following tweets were produced via a Python script, late during the development of PaperBot. There was some manual supervision of the process, but all this was posted from the script itself, and serves as a great example of what we'll be covering in the following sections:

Interestingness Map

The main idea of paper bot is to convert each page of a PDF into a 2D map of "interestingness" which is basically a greyscale image: black is completely boring and white is the most interesting. Here are some examples from the papers posted above:

Then a randomized search algorithm goes through the process of finding an optimal viewport in this map, constrained between 2:1 and 1:1 aspect ratio. It takes about 5,000 iterations to get a nice optimal viewport, using half the iterations as a global search and the other half as local hill climbing.

The challenging part is dealing with the wide variety of PDF files, along with different author styles and tools used. Tricks that work with some documents don't work with others. That was the main reason for using a multi-layered approach to computing interestingness maps...

1. Filtering Out Text

The most boring screenshots are those that contain text, so the first approach involves filtering out all the text with erosion. First, all the background is set to dark and the foreground to white, rendering at 300dpi. Then a 16px erosion filter is applied, which removes most letters from the paper and leaves interesting figures and images.

Left: The original render of one page of the PDF. Middle: Background pixels are marked as boring (black) and foreground marked as interesting (white). Right: Erosion filter applied by 16 pixels, removing small characters.

The only problem with this approach is that it filters out anything less than 16px, and many graphs or figures can be missed.

2. Identifying Figures

To isolate figures in the paper, edge detection can be used. There are a variety of solutions, but we used the ones built-in to scikit-image. They are pretty slow on images at 300dpi but work very reliably. The bot only posts every 11h so it's not a performance issue!

Figure: Finding edges using a Canny Edge Detector then a Probabilistic Hough Transform, setup as a sequence of operations.

In practice, we don't use all the lines identified, but only those of a specific length. At 300dpi, it's currently 50px.

Finding lines doesn't always work if there are many short segments, so a third pass is setup as an optional phase...

3. Attention To Color

A third approach is to hone in on colored areas, to avoid the black & white that's associated with text. Here, the bots computes the color of each pixel as the difference between all RGB components, and stores it in the map too.

Left: The original render of one page of another PDF. Right: Areas where the image is not greyscale, calculated as the sum of differences between all components.

The big issue with this approach is that some fonts render with colors in certain PDF files, due to sub-pixel smoothing or possibly kerning. This means the filter treats everything as interesting in rare cases, even text. This pass is only enabled if the other passes didn't find anything.

Conclusions & Open Issues

Overall, I'm quite impressed with the results of the script. It posts sensible images and has given me many insights into the papers I've seen so far! However, it also highlights that some papers are extremely boring and there's a wide range of accessibility standards when it comes to authoring papers.

Technically speaking, the script doesn't do a great job at capturing the areas around each figure, to help provide context when posted on social media. This is due to the 16px erosion and the fact that legends and labels will get filtered out, and are not taken into account by the optimization process.

If you're curious about the output of the bot, you can find a list of all the posts by the script in this Twitter Search, or just follow @AiGameDev on Twitter :-)

NOTE: If you're interested in Artificial Intelligence, why not join us at the nucl.ai Conference in Vienna on July 20-22. We're organising it for the 7th year running!

Discussion 0 Comments

If you'd like to add a comment or question on this page, simply log-in to the site. You can create an account from the sign-up page if necessary... It takes less than a minute!