Library To Convert Word Document Text To Html
Solution 1:
You just want to convert a *.doc file to HTML? Is saving it as a a HTML file an option?
There is the standard .SaveAs
method which has the option to save as HTML:
wdFormatHTML Saves all text and formatting with HTML tags so that the resulting document can be viewed in a Web browser.
from: MSDN SaveAs Method
An example tutorial on how to use the method to convert .doc to a different format you can find here: How to convert DOC into other formats using C#.
If you have *.docx files instead of *.doc files it is even easier because you get to use the OpenXML API like explained on MSDN here: Manipulating Word 2007 Files with the Open XML Format API (Part 1 of 3). And if you get the XML of the Word file you can of course output it to any format (HTML) you want.
Solution 2:
Convert your doc files to pdf with the help of JOdConverter and OpenOffice
See How to convert ppt to images in Ruby? for reference
and then use pdftohtml (http://pdftohtml.sourceforge.net) a utility which converts PDF files into HTML.
You will get amazing results.
Post a Comment for "Library To Convert Word Document Text To Html"