Tech Help Needed: MS Word HTML to DOCX

userpic=compusaurOh, Great Internet Guru. Your technical help is required.

I’m writing some perl scripts at work that have as one of their ultimate end goals the generation of an MS Word document that can ideally serve as a subdocument in a master document/subdocument arrangement. I’ve done Master/Subdocuments before with DOCX format, and I’ve discovered that if the master is a DOC, it will look for subdocuments that are DOCs.

So far, I’ve got the script generating the variant of HTML that Word understands so that it can get the proper formats (e.g., I copy the prologue that defines the Word formats I need and generate HTML with appropriate CLASS= statements). This gives me a standalone .htm file, which I can rename as .doc and Word handles just fine. However, if you save it, Word knows it is really HTML and creates this funky subdirectory with files that it really doesn’t use.

I’ve been looking for a way that I can convert this .HTM or .DOC file into a real Word .DOC or .DOCX file without interaction (i.e., from the command line). I tried going the Macro approach, and even found that it saves the macro in that subdirectory for the .HTM files… however, the security restrictions on our systems here mean that I can’t execute the macro from the MS Word command line via /m . There’s a file WORDCONV.EXE in the Office12 directory, but it doesn’t seem to do anything, and I can’t seem to find any documentation on it.

So, for those MS Word gurus out there — any ideas? I can live with what I’ve got now (the .htm file I rename); I’d just like something cleaner.

Share