Hi all. I am drafting some documents which end format has not yet been decided. Right now I am using MSWord, but the documents might become a lot more useful if available on HTML format. By nature, I am quite lazy; I hate to do twice what needs only be done once. Does anybody have any sugegstions about how to organize the work such that I only draft the text or contents once, and then produce MSWord, HTML or other formats? Some sort of XML-based system comes to mind, but how does one actually work with that sort of thing? Editors, file systems, the logistics, etc. Rune
OT? Information processing
Started by ●May 4, 2007
Reply by ●May 4, 20072007-05-04
Rune Allnor <allnor@tele.ntnu.no> writes:> Hi all. > > I am drafting some documents which end format has not yet been > decided. > Right now I am using MSWord, but the documents might become a lot > more useful if available on HTML format. > > By nature, I am quite lazy; I hate to do twice what needs only be done > once. > > Does anybody have any sugegstions about how to organize the work such > that I only draft the text or contents once, and then produce MSWord, > HTML > or other formats? > > Some sort of XML-based system comes to mind, but how does one > actually > work with that sort of thing? Editors, file systems, the logistics, > etc. > > RuneHi Rune, Of course you know I like to push LaTeX/TeX anytime I can, so here's one of them. If you wrote the initial document in LaTeX, you can convert it to work using Chikrii Softlab's "TeX2Word." http://www.chikrii.com/ I have used this and it works pretty well. You can also convert LaTeX into html using TeX4ht. I have not used this, but it looks very good and I am considering it in the future. An example TeX4ht output is at ftp://ctan.tug.org/tex-archive/info/webguide/webguide.html -- % Randy Yates % "She's sweet on Wagner-I think she'd die for Beethoven. %% Fuquay-Varina, NC % She love the way Puccini lays down a tune, and %%% 919-577-9882 % Verdi's always creepin' from her room." %%%% <yates@ieee.org> % "Rockaria", *A New World Record*, ELO http://home.earthlink.net/~yatescr
Reply by ●May 5, 20072007-05-05
Rune Allnor wrote:> Hi all. > > I am drafting some documents which end format has not yet been > decided. > Right now I am using MSWord, but the documents might become a lot > more useful if available on HTML format. > > By nature, I am quite lazy; I hate to do twice what needs only be done > once. > > Does anybody have any sugegstions about how to organize the work such > that I only draft the text or contents once, and then produce MSWord, > HTML > or other formats? > > Some sort of XML-based system comes to mind, but how does one > actually > work with that sort of thing? Editors, file systems, the logistics, > etc.Use Open Office, an open source word processor that can save documents in both the formats you want. http://www.openoffice.org/ Jerry -- Engineering is the art of making what you want from things you can get. ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
Reply by ●May 5, 20072007-05-05
On Fri, 04 May 2007 23:12:32 -0400, Jerry Avins <jya@ieee.org> wrote in comp.dsp:> Rune Allnor wrote: > > Hi all. > > > > I am drafting some documents which end format has not yet been > > decided. > > Right now I am using MSWord, but the documents might become a lot > > more useful if available on HTML format. > > > > By nature, I am quite lazy; I hate to do twice what needs only be done > > once. > > > > Does anybody have any sugegstions about how to organize the work such > > that I only draft the text or contents once, and then produce MSWord, > > HTML > > or other formats? > > > > Some sort of XML-based system comes to mind, but how does one > > actually > > work with that sort of thing? Editors, file systems, the logistics, > > etc. > > Use Open Office, an open source word processor that can save documents > in both the formats you want. http://www.openoffice.org/ > > JerryI would have suggested Open Office, but Jerry beat me to it. I just want to add that Open Office can directly output PDF files so that gives you a third format option. -- Jack Klein Home: http://JK-Technology.Com FAQs for comp.lang.c http://c-faq.com/ comp.lang.c++ http://www.parashift.com/c++-faq-lite/ alt.comp.lang.learn.c-c++ http://www.club.cc.cmu.edu/~ajo/docs/FAQ-acllc.html
Reply by ●May 5, 20072007-05-05
Rune Allnor wrote:> Does anybody have any sugegstions about how to organize the work such > that I only draft the text or contents once, and then produce MSWord, > HTML > or other formats?Rune, If PDF is OK as final format, you can use pdf24 to create the pdf-file(s). Freeware, available at: <http://www.pdf24.org/en/pdf24-creator.htm> The advantage is that you can use *anything* to create content. Anything that can send output to a printer, can be used by pdf24 to produce a pdf-file. Graphics, text, spreadsheets, whatever.
Reply by ●May 5, 20072007-05-05
Hi Rune, I have successfully used LaTeX tools to generate all kinds of documents (PS, PDF, HTML, RTF, ...). It is a good option as you get incredibly well-typesetted equations (which are converted to GIFs for Word and HTML). I also have a Makefile that automates everything. If you are interested, I would be glad to e-mail it to you. -Vijay. On May 4, 9:23 pm, Rune Allnor <all...@tele.ntnu.no> wrote:> Hi all. > > I am drafting some documents which end format has not yet been > decided. > Right now I am using MSWord, but the documents might become a lot > more useful if available on HTML format. > > By nature, I am quite lazy; I hate to do twice what needs only be done > once. > > Does anybody have any sugegstions about how to organize the work such > that I only draft the text or contents once, and then produce MSWord, > HTML > or other formats? > > Some sort of XML-based system comes to mind, but how does one > actually > work with that sort of thing? Editors, file systems, the logistics, > etc. > > Rune
Reply by ●May 5, 20072007-05-05
"Rune Allnor" schrieb> I am drafting some documents which end format has not > yet been decided. Right now I am using MSWord, but the > documents might become a lot more useful if available > on HTML format. >Word can also produce HTML output (although the output is bloated and ugly), with the proper printer driver you can also produce PDF. OpenOffice can convert Word files to other formats, so you might want to look at that also (but make sure that your version of OO understands the features you use in your Word version, sometimes the conversion doesn't look well). Regards Martin
Reply by ●May 5, 20072007-05-05
Rune Allnor wrote:> Hi all. > > I am drafting some documents which end format has not yet been > decided. > Right now I am using MSWord, but the documents might become a lot > more useful if available on HTML format. > > By nature, I am quite lazy; I hate to do twice what needs only be done > once. > > Does anybody have any sugegstions about how to organize the work such > that I only draft the text or contents once, and then produce MSWord, > HTML > or other formats? > > Some sort of XML-based system comes to mind, but how does one > actually > work with that sort of thing? Editors, file systems, the logistics, > etc.Hi Rune, I found that for organizing projects (and work flow), one single file format is just not enough. You typically have source code, documentation in some propriety text editor format (perhaps Word), data bases, LaTeX stuff, spreadsheets, you have papers and articles in pdf, plus some notes or scans that are in some graphics format and maybe even links to websites, web services or web articles. Also, typically, several people are working on and editing several different or possibly overlapping parts of a project. Some kind of sharing mechanism is needed (which is not supported for Word documents, for example). We found dokuwiki (http://wiki.splitbrain.org/wiki:dokuwiki) quite useful for such tasks. Wiki syntax is simple and flexible, and the source files are in ASCII text, stored differentially (the whole editing history is always available). As others have said, creating PDFs is possible from almost every other format simply by providing a PDF printer driver. If you just have to draft a single document, and want to be flexible with the output format (slides, articles, html, etc.) I would also suggest LaTeX. Regards Andor
Reply by ●May 5, 20072007-05-05
Rune Allnor <allnor@tele.ntnu.no> writes:> ...drafting some documents ... > Right now I am using MSWord, but the documents might become a lot > more useful if available on HTML format. > > By nature, I am quite lazy; I hate to do twice what needs only be done > once. > > Does anybody have any sugegstions about how to organize the work such > that I only draft the text or contents once, and then produce MSWord, > HTML > or other formats?Rune, here you are bringing up a subject that has been my main side-project for many years. FYI: Decades ago I did my diploma thesis in theoretical nuclear physics and then became a software engineer, working chiefly on technical software. I worked in several groups in the industry that finally failed to deliver a really usefull product. The more I thought over where the reasons of these failures were, the more I became convinced that it was lack of easily readable documentation. Thus, when starting as a freelancer, I knew what to look for. Here is a list of what I think is important: a) Write literate programs (See FAQ from comp.programming.literate for example.) This is a way of interleaving chunks of program code and documentation that was introduced by D.E. Knuth who demonstrated the usefullness of this style in his implementation of TeX. b) Use semantic markup instead of literal markup. Semantic markup is sometimes also called ``logical markup''. Here follows a copy from one of my documentations that explains these terms: When speaking about documentation systems, we have to be aware of two pairs of choices: �{literal markup} versus �{semantic markup}, and �{direct manipulation} versus �{source code manipulation}@footnote{First pair of terms is introduced in documents on SGML IIRC, and second pair of terms is intoduced in documentation to @code{makeindex} by @auth{Cheng} et.� al.} The distinction of literal markup versus semantic markup is based on what the file representing the document contains: specification of how special parts of the document should look like in print or of what their sematics is from the document author's point of view. The distinction of direct manipulation and source code manipulation is based on the way the writer of the document works: he may view the document on screen similar to what it would look like in print and manipulate the appearance of the document---this is widely called WYSIWYG---or he may edit a source code of the document, explicitely inserting �{contol sequences}. Obviously direct manipulation harmonizes well with literal markup@footnote{And probably it was this relationship that caused Brian Kernighan to say: WYSIWYG is WYSIWAG.} but it also allows semantic markup; just becomes a little less obvious to the writer what semantics he claimend for some special part of the document. Source code manipulation on the orther side is neutral with respect to semantic or literal markup. Here follows another citation from the LaTeX manpage: The LaTeX macros encourage writers to think about the content of their documents, rather than the form. The ideal, very difficult to realize, is to have no formatting commands (like ``switch to italic'' or ``skip 2 picas'') in the document at all; instead, everything is done by spe- cific markup instructions: ``emphasize'', ``start a section''. Some more comments: Kernighan's acronym WYSIWAG stands for ``What you see is _all_ you get''. There is an easily readable book by a technical author with a title like ``ABCD ... SGML'' (have lent it to somebody; title really includes dots IIRC.) She explains that it just is not worth the work to do all the markup (be it you just select some region of text and then a menue item) for the sole purpose of changing locally the font--- seems the OP realized exactly this. Doing this, the author really puts knowledge into the document--- info on semantics of some text, and this information should be conserved in the document; this way we have the option to extract it mechanically at some later stage and reuse it in ways we currently might not yet think of. c) NEVER use a system, where the format of your file is not documented to the public. How would you exploit your document with an arbitrary tool? Who guarantees you will have the software needed to read your document in a decade? I am reading de.comp.text.tex, a NG where lots of typographers are, doing serious work, also with MS Word. They recently reported that it is not uncommon that a MS Word document has become totally useless after a few years, some bit seems to have fliped and MS Word will totally fail on the file, giving you no chance to rescue a part. d) LaTeX. I don't use it but it would be the first thing I would suggest to--- say--- a student of engineering who just wants to write a thesis or such. e) TeX. This is a textformatter. There are several ``improved'' implementations, but what I say here refers to Knuths original work, which is still widely used. First: TeX is a masterpiece of softwareengineering. Indeed, Knuth states somewhere he wanted to write a not-so-small program which a ``professor of computerscience can be proud of''. TeX includes a powerfull macro facility and this is the basis for macro packages like LaTeX and Texinfo. But be warned: TeX programming is difficult. Having experience with ``normal'' programming languages such as C does not help; its more of a handicape. It took me about five years (as a side-project) to become really comfortable with TeX macros. What probably helped me most was that I learned other languages like Forth, PostScript, and Scheme. f) Texinfo. This is the GNU documentation system implemented on top of TeX. AFAIK it was originally the Scribe textformatter, but when this became a commercial product, Richard Stallmann hacked a replacement on top of TeX. The advantages of Texinfo compared to LaTex is two things: 1) it is semantic markup. 2) its synatx is much mor readable, much less ``visual noise''. The disadvantage is, that Texinfo is rather static, no easy way to get something that does not look like GNU documenmtation. g) WoTAn. (That's the german name of Odin.) No, that's the thing I currently use. Stands for ``Wolfs TeX Anwendung'' which translates into ``Wolf's TeX application''. ;-) This started out from Texinfo which was extended, and finally rewritten by me, to do all I need when it comes to text formatting. Though I would like to puplish it under GPL, I have not done so for several reasons: 1) Though well documented--- literate programm--- it has grown organically over the years and still includes code that just demonstrates how not to program TeX. 2) It's still alpha; if I think I have a better idea on something, I will implement this and change the about twothousand pages of maintained documentation I have, if necessary. (Which never took me more than a few minutes for a-- say--- 300 page document.) 3) Same as with LaTeX and Texinfo, it's an abuse of the TeX macro facility, and should be totally replaced, following established principles of language processor design. Really! But I lack the time to do so. h) XML, SGML.> > Some sort of XML-based system comes to mind, but how does one > actually > work with that sort of thing? Editors, file systems, the logistics, > etc. >Wanted to specialize on SGML in the nineties; have partially studied Goldfarb's SGML Handbook. SGML is a meta language. This means, that it specifies a language, that is supposed to allow simple specification of a markup language. The essential thing here is, that there exist parsergenerators, that can read the SGML specification of the markup language, and generate a parser for this language. (Compare this to the specification of the e.g. C syntax using YACC or GNU Bison.) A well known and simple example of SGML usage is HTML: today, this is specified using SGML. The problem with SGML is a feature called ``tag minimization''. While ``entities''--- I asume this term is well enough known to readers of comp.dsp from HTML--- ``normally'' should be enclosed by a start and a end tag, either of these tags can be omitted ``if it can be infered from the context that it should be here''--- Goldfarb comes from the law school, not from math or engineering. Now this introduced problems in language and software design that ``ordinary'' programmers can't be expected to master. Therfore, AFAIK, XML was proposed. This is a subset of SGML that does not allow tag minimization. Thus HTML can't be specified with SGML. I looked only superficially at XML, but AFAIK what I will now say on SGML applies to XML as well. Syntactically SGML defines just s-expressions (symbolic expressions, sexprs) as the Lisp-people call this. That's nothing else than a fully parenthized polish notation--- as Lisp or Scheme use. But contrary to what those functional languages do, SGML does not allow to specify any processing power. This is totally left to the application that happen to process the document written in a SGML-specified markup language. In consequence, SGML is full of ``features'', intended to substitute processing power ``in the necessary cases''. Featureism ... Instead of trying to explain further myself, I will cite a brilliant guy, whose opinion totally matched my own impression, such that I left the SGML thing.> Date: 09 Jun 1999 03:45:06 +0000 > From: Erik Naggum <erik@naggum.no> > Message-ID: <3137888706865673@naggum.no> > References: <m3so82qwg8.fsf_-_@world.std.com> > Subject: Re: Lisp syntax, what about resynchronization? > > * Tom Breton <tob@world.std.com> > | I agree with most of the comments on this thread about Lisp syntax. > | But it occurs to me that one advantage that heterogeneous Algol-type > | syntaxes have over Lisp is that when they get lost, they can detect > | being lost and resynchronize (And thus produce more errors, but that's > | helpful for debugging). In Lisp, one misplaced parenthesis can easily > | put you into "Where the hell is this problem coming from?" mode. > > that's why we have editors instead of compilers help us find problems. > one very easy way to spot parenthesis errors is to let Emacs indent the > whole top-level form. if something moves, undo the indentation, and fix > whatever you fairly immediately see caused the problem. repeat until > nothing moves. > > | (*) Actually, way back when someone here was saying that XML was "Lisp > | with brackets" (which I don't entirely buy, but Lisp syntax for > | markup... yum). Where resynchronization in a programming language > | just lets you debug more easily, in a markup language it can make all > | the difference between a document that will render and one that won't. > > having been one of the leading SGML experts in the world before I finally > came to conclude it was a fundamentally braindamaged approach (but a good > solution once you had taken the wrong approach and stuck with it -- like > so many serious design errors in programming, or, indeed, politics, where > it is always harder to get back on track than to continue forward and to > knock down ever more hindrances -- it's like driving a tank: if you drift > off the road, any telephone poles you might knock down are only proof > positive of your mighty tank's ability to get where you want to go, and > the important psyhological "corrector" that hindrances should have been > is purposefully ignored because you are too powerful), I could go into a > long an arduous debate over exactly how little resynchronization ability > SGML's "labeled parentheses" gives you, and how immensely hard it is to > backtrack the SGML parsing process. it's a pity that XML is actually a > little _worse_ in this regard. > > one of the reasons I got into SGML was that it had beautiful, explicit > markers of the beginning and end of the syntactic structure. this I > sensed was great because I had had a lot of Lisp exposure before SGML. > however, Lisp's elegance loses all its beauty if you adorn parentheses > with labels, because you actually _lose_ synchronization ability when you > have to deal with conflated errors: <BAR>...<BAZ>...</BAR> may close BAZ > and BAR at the same time in the complex game of omitted end tags in SGML, > but in the fully explicit case, it may be a typo for </BAZ>, or may be a > missing </BAZ>. any attempt to resynchronize will get it right half the > time, which causes subsequent errors to crop up, and then you'd have to > go back and try the other possibility, but that means reshuffling your > entire tree structure. the same is true of empty elements that are > mistaken for containers. perhaps <BAZ> should have been <BAZ/>? that > means you get it right only a _third_ of the time in XML where you don't > even have the DTD to help you decide anything, anymore. 'tis wondrous! >[Rest of this article deleted by HW.] You may find lot's more on SGML by Erik Naggum on the net--- and he has a sharp blade. Indeed, there is no doubt, that you can specify any document you ever want using SGML. Neglecting disc space--- which is cheap enough these days--- the only problem is wether you have the tools to handle this ``mess'' in an enjoyable way. Rune, if you are seriously intrested in SGML/XML, you might look at the Linux documentation project which deals with how to write Linux related documentation. AFAIK they have migrated to SGML-based file format a few years ago and composed a toolsuite to deal with this baroque syntax. i) Roff. Don't want to forget the roff-family of text formaters--- be it for the reason that this is why development of Unix was originally permitted. They are lightwight, fast, and sufficient for technical reports. But they are little used these days. Finally, there is the OP's requirement to convert to MS Word. I can't say much about how to convert from a TeX etc. based document to MS Word. Indeed, I am wondering why you would want this. Do your customers insist on MS? Wouldn't they be satisfied with PDF? From what I got occasionally in de.comp.text.tex, it is at least difficult to get from a TeX written document a Word document that looks close to the original--- except when you import the individual pages as graphics. But on the level of e.g. RTF there would be no problem. Above has been a survey of the topics related to the OP's question, as far as I can see them at the moment. In summary, IMO the essential thing is, to write your documentation in a well defined format, that allows semantic markup. Then you are free to produce mechanically print, HTML, or whatever else you want--- provided, that's no proprietary stuff. -- hw
Reply by ●May 5, 20072007-05-05
On 5 May, 16:44, Heinrich Wolf <hwmu...@willis-werkstaette.de> wrote:> Rune Allnor <all...@tele.ntnu.no> writes: > > ...drafting some documents ... > > Right now I am using MSWord, but the documents might become a lot > > more useful if available on HTML format. > > > By nature, I am quite lazy; I hate to do twice what needs only be done > > once. > > > Does anybody have any sugegstions about how to organize the work such > > that I only draft the text or contents once, and then produce MSWord, > > HTML > > or other formats? > > Rune, here you are bringing up a subject that has been my main > side-project for many years. > > FYI: Decades ago I did my diploma thesis in theoretical nuclear > physics and then became a software engineer, working chiefly on > technical software. > > I worked in several groups in the industry that finally failed to > deliver a really usefull product. The more I thought over where the > reasons of these failures were, the more I became convinced that it was > lack of easily readable documentation. Thus, when starting as a > freelancer, I knew what to look for.As they say, "the devil is in the details." I missed the main objective in my PhD thesis because the polarity of some seismic senors never were logged.> Here is a list of what I think is important: > > a) Write literate programs...> b) Use semantic markup instead of literal markup....> Kernighan's acronym WYSIWAG stands for ``What you see is _all_ you > get''.You need to have worked a bit with these sorts of things to see the impact of that one, I like it!> the author > really puts knowledge into the document--- info on semantics of some > text, and this information should be conserved in the document; this > way we have the option to extract it mechanically at some later stage > and reuse it in ways we currently might not yet think of. > > c) NEVER use a system, where the format of your file is not documented > to the public. How would you exploit your document with an arbitrary > tool? Who guarantees you will have the software needed to read your > document in a decade?The two arguments above are my reasons for asking the question. My employer demands that we use MSWord, while I can see that at least some of the documents would be a lot more accessible and useful if in HTML form. I wrote a term paper some 20 years ago in WP 5.1. I still have the 5 1/4" floppy disk lying around, somewhere...> I am reading de.comp.text.tex, a NG where lots of typographers are, > doing serious work, also with MS Word. They recently reported that it > is not uncommon that a MS Word document has become totally useless > after a few years, some bit seems to have fliped and MS Word will > totally fail on the file, giving you no chance to rescue a part. > > d) LaTeX. I don't use it but it would be the first thing I would > suggest to--- say--- a student of engineering who just wants to write > a thesis or such.Have used it for personal stuff the last15 years. Not an option in my current workplace. ...> h) XML, SGML. > > > > > Some sort of XML-based system comes to mind, but how does one > > actually > > work with that sort of thing? Editors, file systems, the logistics, > > etc....> I will cite a brilliant guy, whose opinion totally > matched my own impression, such that I left the SGML thing. > > > Date: 09 Jun 1999 03:45:06 +0000 > > From: Erik Naggum <e...@naggum.no> > > Message-ID: <3137888706865...@naggum.no> > > References: <m3so82qwg8.fsf...@world.std.com> > > Subject: Re: Lisp syntax, what about resynchronization?...> > one of the reasons I got into SGML was that it had beautiful, explicit > > markers of the beginning and end of the syntactic structure. this I > > sensed was great because I had had a lot of Lisp exposure before SGML. > > however, Lisp's elegance loses all its beauty if you adorn parentheses > > with labels, because you actually _lose_ synchronization ability when you > > have to deal with conflated errors: <BAR>...<BAZ>...</BAR> may close BAZ > > and BAR at the same time in the complex game of omitted end tags in SGML, > > but in the fully explicit case, it may be a typo for </BAZ>, or may be a > > missing </BAZ>....> You may find lot's more on SGML by Erik Naggum on the net--- and he > has a sharp blade.Ah. Interesting comments there. They go a long way to explain why so few XML document autheinticator programs are available. ...> Finally, there is the OP's requirement to convert to MS Word. > > I can't say much about how to convert from a TeX etc. based document > to MS Word. Indeed, I am wondering why you would want this. Do your > customers insist on MS? Wouldn't they be satisfied with PDF?For better or for worse, MSOffice is the de facto standard where I work. Everything is stored and communicated as either MSWord or Excel files. I don't mind the format as such, but handling all the info is a pain. Extracting everything to XML format would enable me to handle the info once and then use a parser/translator to generate exactly the type of document I want. As well as make background programs which could sync documents and databases etc. ...> In summary, IMO the essential thing is, to write your documentation in > a well defined format, that allows semantic markup. Then you are free > to produce mechanically print, HTML, or whatever else you want--- > provided, that's no proprietary stuff.Exactly. Rune






