Tika OneNote Support

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Tika OneNote Support

122jxgcn
Hello,

Is there anyone who worked on extracting contents from MS OneNote file? (*.one)
It will be great if someone can tell me how to work with parsing OneNote files programatically.

Thanks.
Reply | Threaded
Open this post in threaded view
|

Re: Tika OneNote Support

Nick Burch-2
On Wed, 14 Nov 2012, 122jxgcn wrote:
> Is there anyone who worked on extracting contents from MS OneNote file?
> (*.one) It will be great if someone can tell me how to work with parsing
> OneNote files programatically.

I'm not aware of anything. The good news is that the file format is fully
documented:
http://msdn.microsoft.com/en-us/library/dd924743%28v=office.12%29.aspx
http://msdn.microsoft.com/en-us/library/dd951288%28v=office.12%29.aspx

You'll need to use the specification to write some code to read the
format, then you can feed it to Tika. My hunch is you're looking at 5-15
days of work.

Apache POI would probably be a good home for most of the OneNote code if
you do get it working, please consider contributing it there if you make
progress!

Nick