Tuesday, April 5, 2016

Ubuntu - VM player: Ubuntu guest keeps going back to login screen on login


Scenario: Any ubuntu vm keeps going back to login screen even after entering valid credentials

Add line

mks.gl.allowBlacklistedDrivers = "TRUE"
to the VMX file.
http://askubuntu.com/questions/443474/how-can-i-enable-3d-acceleration-for-ubuntu-as-a-vmware-guest?lq=1

Friday, April 1, 2016

Image PDF to text PDF using OCR

Usecase: I have a PDF multi-page document that was compiled by scanning the physical document. I want to read that in my kindle and make notes.

The problem: Since the PDF is made up of images, kindle renders the pages as images. Hence, the text in the pages was not selectable.

Solution:
Use OCR.

Ref:
http://ubuntuforums.org/showthread.php?t=880471

Steps:

  1. pdftoppm generates a 100MB ppm per page. Should ideally iterate per page and delete
  2. convert ppm to tif: tesseract accepts tif
  3. Use tesseract for OCR: generates txt
  4. Append all txt to output file
  5. Create pdf out of the txt.


Script:
#!/bin/sh
mkdir tmp
cp $@ tmp
cd tmppdftoppm * -f 1 -l 10 -r 600 ocrbookfor i in *.ppm; do     convert "$i" "`basename "$i" .ppm`.tif";     tesseract "$i" "`basename "$i" .tif`" -l eng;    cat "`basename "$i" .txt`" >> pdf-ocr-output.txt;    echo "[pagebreak]" >> pdf-ocr-output.txt;done mv pdf-ocr-output.txt ..rm *cd ..rmdir tmp