Project Naptha automatically applies state-of-the-art computer vision algorithms on every image you see while browsing the web. The result is a seamless and intuitive experience, where you can highlight as well as copy and paste and even edit and translate the text formerly trapped within an image. not yet supported (but feel free to play around with this page, which shows off most of the features and works on most modern browsers), currently only Google Chrome is supported. Type in your email below and sign up for updates on this project. Depending on the number of sign-ups, a Firefox version may be released in a few weeks. If you're interested in Naptha for other browsers, Unfortunately, your browser is(but feel free to play around with this page, which shows off most of the features and works on most modern browsers), currently only Google Chrome is supported. Type in your email below and sign up for updates on this project. Depending on the number of sign-ups, a Firefox version may be released in a few weeks. If you're interested in Naptha for other browsers, email me Words on the web exist in two forms: there’s the text of articles, emails, tweets, chats and blogs— which can be copied, searched, translated, edited and selected— and then there’s the text which is shackled to images, found in comics, document scans, photographs, posters, charts, diagrams, screenshots and memes. Interaction with this second type of text has always been a second class experience, the only way to search or copy a sentence from an image would be to do as the ancient monks did, manually transcribing regions of interest. This entire webpage is a live demo. You can watch as moving your cursor over a block of words changes it into the little I-beam. You can drag over a few lines and watch as a semitransparent blue box highlights the text, helping you keep track of where you are and what you’re reading. Hit Ctrl+C to copy the text, where you can paste it into a search bar, a Word document, an email or a chat window. Right-click and you can erase the words from an image, edit the words, or even translate it into a different language. This was made by @antimatter15 (+KevinKwok on Google+), and Guillermo Webster.
Early in October 2013, coincidentally less than a week before I developed the first prototype of this extension, xkcd published a comic (shown on the right) which somewhat ironically depicts the impetus for the extension. The comic decries websites which arbitrarily hinder users from absentmindedly selecting random blocks of text— but the irony is that xkcd should count himself among the long list of offenders because up until now, it simply wasn't possible to select text inside a comic. An interesting thing to note is the language agnostic nature of Project Naptha's underlying SWT algorithm (see the technical details by scrolling down a bit more) makes it detect the little squiggles as text as well. Depending on how you look at it, this can be seen as a bug, or a feature. Also, because handwriting detection is particularly difficult (in particular, the issue is character segmentation, it's quite difficult to separate apart letters which are smushed so close as to be connected), if you try to copy and paste text from a comic, it ends up jumbled. This might be improved in the future, because certain parts of the Naptha stack do lag behind the present state-of-the-art by a few years.
It usually takes some special software to convert a scan into a PDF document that you can highlight and copy from, and this extra step means that a lot of the time, you aren't dealing with a nicely formatted and processed PDF, but a raw scan distributed as a TIFF or JPEG. Usually, that just meant suffering through the document, or in the worst case, printing it out so that I could scribble with a pen along, while I read. But with this extension, it's possible to just select text from a picture, attached to an email, or linked from a class action lawsuit overview. It's even possible for files you have locally on your computer. Simply drag the image file over to your browser window. Note that you might have to go to and check the "Allow access to file URLs" checkbox.
The truth is that I've spent way too much time on reddit and 4chan in search of test images for the text detection and layout analysis algorithms. Time really does go by when you can rationalize procrastination as something "productive". The result is that my test corpus is something on the order of 50% internet meme (In particular, I'm a fan of Doge, in part because Comic Sans is interpreted remarkably well by the built-in Ocrad text recognizer). It's actually a bit difficult to recognize the text of the standard-template internet meme (mad props to CaptionBot, bro). Bold Impact font is actually notoriously hard to recognize with general-purpose text recognizers because a lot of what distinguishes letters isn't the overall shape, but rather the subtle rounding of corners (compare D, 0, O) or relatively short protrusions (the stubby little tail for L that differentiates it from an I). I started building a text recognizer algorithm specifically designed for Impact font, and it was actually working pretty well, but I kind of misplaced the code somewhere. So, until I find it or replace it, you'll have to use Tesseract configured with the "Internet Meme" language.
Screenshots are a nice way to save things in a state that you can recall later in a more or less complete form— the only caveat being the fact that you would have to re-type the text later if you find a need for it. On the other hand, copying and saving just the text of something ends up losing the spatial context of its origin. Project Naptha kind of transforms static screenshots into something more akin to an interactive snapshot of the computer as it was when the screen was captured. While clicking on buttons won't submit forms or upload documents, the cursor changes when hovering over different parts, and blocks of text become selectable, just like they were before frozen in carbonite. While it's not a perfect substitute— the text recognition screws up every once in a while, so the reconstruction isn't reliably perfect, it still has a rather significant and profound effect.
With the same trick that Translation uses— it's possible to substitute in your own text. This will probably work better in the future, once there's some actual font detecting logic besides if uppercase and super bold, then Impact font, if uppercase otherwise then XKCD font, and for everything else, Helvetica Neue . I don't know where else to mention this, because it's one of those little things that simultaneously applies to everything and nothing at once— but it's also possible to select multiple regions by holding the shift key. I spent way too long writing the algorithms to merge multiple selection regions when appropriate. Try it out: Highlight some meme text. Right click on the selection squig and click "Reprint Text", which can be found under the "Translate" menu. After that, select the text on one region that you'd like to edit and click "Modify Text" which should appear in the context menu.
During May 2012, I was reading about seam carving, an interesting and almost magical algorithm which could rescale images without apparently squishing it. After playing with the the little seams that the seam carver tended to generate, I noticed that they tended to converge arrange themselves in a way that cut through the spaces in between letters (dynamic programming approaches are actually fairly common when it comes to letter segmentation, but I didn't know that). It was then, while reading a particularly verbose smbc comic, I thought that it should be possible to come up with something which would read images (with