What's new
What's new

OT- Automatic book scanning

Software that seems to work quite well:
Software isn't the problem. There are plenty of programs to handle OCR (some better than others).

The problem is handling the paper.

If you can burst the book, or if it's already loose-leaf, any sheet-feed scanner will do the job.

- Leigh
 
If you can burst the book, or if it's already loose-leaf, any sheet-feed scanner will do the job.
If I could burst the book or if it was already loose leaf, obviously I wouldn't be looking at specialized book scanners now would I ? :rolleyes5:
 
Keelan,

Cool stuff that Unpaper.

Milacron,

Seems the paper handling is the neat part. From what I can see they simply use a vacuum strip of holes on the scanner head. Not a particularly new idea but a nice application. I had a 20 year old HP Plotter that used the same concept. The USPS uses vacuum heads to pick up individual letters from a stack for scanning (when they won't feed into the belt scanner properly.)

The Treventus people just packaged it up all nicely. Looks like they are trying to bring it to North American markets.

If I were you I would try and copy the Treventus system rather than the "hand turned" Instructables page.

-DU-
 
The suction page turner is really quite neat, and to combine page-turning with flush scanning, instead of photography is quite ingenious. Hopefully they have a blower behind the book the leaf out/separate the pages, like some sheetfed presses do (Heidelbergs come to mind).

Anyway, pricing...I might guess into the low-mid 6 figure range for the ScanRobot, depending on options.

The ATIZ BookDrivePro sells for something around $12,000, and includes no auto page turner. The Kirtas Tech scanner supposedly costs around $140,000, for the full system, including OCR, etc. though that pricing might be a bit old. It uses a vacuum page turner that functions sort of like a human hand.

My better half works at a major research library, so I hear tidbits...If you want more info, I can ask.
 
Software isn't the problem. There are plenty of programs to handle OCR (some better than others).

The problem is handling the paper.

If you can burst the book, or if it's already loose-leaf, any sheet-feed scanner will do the job.

- Leigh

Leigh,

RTFA, the link I provided is for software that de-skews and de-craps the scanned image. It doesn't preform OCR at all.

I suspect that if cutting off the binding were an option, Milacron wouldn't have posted links pertinent to machines that scan bound books.
 
The ATIZ BookDrivePro sells for something around $12,000, and includes no auto page turner. The Kirtas Tech scanner supposedly costs around $140,000, for the full system, including OCR, etc. though that pricing might be a bit old. It uses a vacuum page turner that functions sort of like a human hand.
$140,000 !!! :eek::eek: Good lord, I had no idea... maybe Haas needs to come out with one of these :)

Just now looked at the Cambridge University video http://www.kirtas.com/ and it's interesting that even with the automatic page turner they still employ folks to stand there to flatten each turned page by hand...now wouldn't that be an exciting job ! :wrong: :sleepy:
 
Your even more exciting job is going through it all afterwards correcting all the mistakes the OCR has made.
Wouldn't it be the case that with many of the books they are copying that OCR is not needed ? In other words, aren't they just copying for direct reprint of what is scanned ?

Or is it the case that they do need to use OCR in order to make the books "searchable" via keywords ? Perhaps there are two versions...direct reproduction, to preserve the original fonts and feel of the book, and the OCR version for keyword search possiblities. In which case I would think it wouldn't be so critical for the OCR version to be without some errors here and there.
 
Most of the stuff they scan will go into a digital version for manipulation and reproduction, it's not often they do a direct image scan and then straight to print.
 
Got to be pretty straightforward with a little bit of tinkering to collect a bunch of images and turn the pages.

Vacuum attachment on the head of a CNC mill to grab the pages with a relay to turn the vacuum on and off. Mount a digital camera on the head too. Lots of remote shutter releases available that can be hooked to a relay as well. Need some fixturing to hold the book down while it's going on (vacuum table?), and potentially to keep pages flat. The latter could be some air clamps with minimal pressure and soft tips so they are more gentle.

I know peeps here have built more complex fixturing than what this would take for other CNC work.

G-code to go down, flip a page, position the camera, snap a pic. Rinse, repeat.

Best,

BW
 
In the ScanRobot by Treventus, any thoughts on how the 2 pages, once scanned, always fall to the left side ? I wonder if it's via the servo positioning of the cradle that puts the book binding in a favorable position for the pages to fall to the left, or via a slight air blast or static electricity or an small arm I can't see or ??

http://www.youtube.com/watch?v=y16rNqnxj0U

I just now did an experiment with a book and it's interesting that for the binding to be positioned for "left fall" it actually needs to be just the opposite of how the book is in the video...so that may not be how they are working that out. I suppose another possibilty might be a seperate vacuum on the left side or a fan on the right side.
 
Or is it the case that they do need to use OCR in order to make the books "searchable" via keywords ?

In the stuff I scan on a routine basis (old telephone documents), my goal is for high quality page image reproduction, searchable text being secondary. I do the OCR thing for all the pages to be searchable, but I'm not going to proof-read and correct every line of OCRed text. It's usually good enough. That's just me, but I'm not alone; Google OCR's all the books they have in Google Books to make them searchable, but from what I've seen, they don't make an extraordinary effort to make the OCR'ed text 100% accurate either.
 








 
Back
Top