Saturday, April 28, 2007

Converting .pdf file (Adobe Acrobat) to .doc file (MS Word)

In the last post I had mentioned a way that I thought I am going to use to get a .doc out of the .pdf I had.

After a few more online searches, I was extremely thrilled to know that there were many more options for me, all thanks to the LaTex community. This http://www.tug.org/utilities/texconv/textopc.html
has all the details! There are many options listed in that site, I am referring to a few
below and also have added some other ideas.

If you have a .tex file and want a .doc file at the end, you can choose one of these methods:
  1. .tex --> .pdf -> (pdftotext) .txt --> (copy & paste) .doc
  2. .tex --> (TeX4ht) .html --> (copy & paste) .doc
  3. .tex --> (LaTex2rtf) .rtf --> (MS Word/Open Office) .doc
  4. .pdf --> (Adobe Acrobat Standard - use 'save as' option) .doc
  5. .??? --> (Google Documents - covert to .doc) .doc


    I have used the 1st option many times before. It always requires some cleanup after the .doc has been created.

    I used the 2nd option for my work this time. The results are nice. Cleanup is not required at all for text. Tables and figures require some cleanup. There is another option "html,word" "symbol/!" "-cvalidate" which supposedly makes the .html output tuned towards MS Word. I haven't used that option yet.

    The 3rd option seems interesting. Should try it.

    The 4th options caught me off guard, since I had not thought of it. I tried it, but Adobe Acrobat did a very bad job. Text, tables will require a lot of cleanup, while no cleanup for figures is required.

    One of my friends recommended the 5th option. Have not tried it fully yet. Tried exporting a .xml file (created with TeX4ht) to Google Writer, but Google does not accept .xml files yet.


    In conclusion, I really must thank the selfless efforts of all the people who have created these amazing open source tools.

    Do refer to http://www.tug.org/utilities/texconv/textopc.html
    for all the options.

, ,

Friday, April 27, 2007

Creating American Psychological Association (APA) Style documents

For a book that I am writing for, I have to submit a .doc file (ugh!) which follows the American Psychological Association (APA) style. Just getting the 70+ references to follow this style was a nightmare for me. With no help from APA and the MS Word community (if there existed one!), I had to turn to Latex to help me out.

The Apacite package was what I used. I thought that as usual it would be relatively a simple task to use Latex, but this time Latex really took away a lot of my time.

I had forgotten how to install packages on a Linux machine and so first had to re-learn that from here. Once I had that working, it just seemed impossible to get the references working!

Firstly, I had to see that all the URLs were enclosed with the \url tag (before that I realised after many hours that I had forgotten the \usepackage{url} directive in the tex file and Latex had not specifically complained about that!), then I added the \bibnodot{.} command to the end of each URL.

After these steps of making my .bib file look as similar as possible to the .bib file provided at the CTAN's site for Apacite, I still was not able to get it working. The worst part was trying to debug the "Runaway argument"!

I got an error which was similar to:
Runaway argument?
\bibitem
! File ended while scanning use of \@tempa.

\par


After some googling and after finding some useful sites for debugging common Latex errors (this one), I still was not making any progress. I had this idea of replacing all the indents (hex09) in my .bib file by good old space (hex 20), thinking that it would help. I don't know if that helped or not, but by this time, I was able to get a few references.

Simply rewriting some of the bibtex entries (at places where Latex complained) helped ultimately and the invisible errors were at last gone!

At the end of hours and hours of working on the format of the document, at last I got a pdf file which followed the APA style guidelines.

((UPDATE-START))
A lot of errors such as this:

?
! Undefined control sequence.
l.1072 ...ves/sum2005/entries/turing-test/}\bibnodot
%
were flagged both by the CUI and GUI (Kile). Just ignoring them produced the .pdf file nevertheless.
((UPDATE-END))

BUT, Latex's promise was the EXACT OPPOSITE!

Latex promises that the writer does not have to worry about the style and can concentrate on the content only. It promises the flexibility of different styles just by changing a few lines or so. Well, Latex did not deliver that promise to me in this case.

But if I had taken the other route of working on the style while using a Word document, surely I would have gone mad by now and would not even have got close to finishing the task.

Now I have a few more simple steps:

  1. convert pdf to text (pdftotext)

  2. copy the text to MS Word (or Open Office)

  3. do the final edits since pdftotext would not have done a neat job

  4. paste all the figures and create tables and

  5. that is it!


, ,