Support for web authors
Microsoft Word and the web
On this page:
In short...
Try to avoid using Word to create web pages and don't paste text from Word directly into a web page.
What's the problem?
The longer version...
Word was never primarily intended to be a web-authoring tool - it writes enormous quantities of
unnecessary code. Web pages written in Word can:
- be four or five times as
big (and slow) as necessary;
- be difficult to edit;
- cause
accessibility problems.
Analysis of some pages - approaches to avoid
All the following pages were created from the same two-page Word document to
illustrate some of the problems different approaches can generate.
Converting to web pages
To be avoided: Word2000sample.htm
[file size: 15kb + 3kb]
Created in Word using Save As Web Page.
Pro: easy; looks just like the original Word
document.
Con: Word creates a sub-folder containing "supporting files" - if
you put the page on the server without these extra files the page will not
display correctly. Large files (four or five times as big, and slow, as
needed), due to bloated "Office-specific" HTML code (view the source
of the file linked above to see what I mean!). Confusingly for many, this will open in Word for editing, even though
the web author finds it through FrontPage or . If you do manage to open the file in
SharePoint Designer or FrontPage or another web authoring program the extra code makes page-editing and
formatting difficult. Formatting is fixed, so users are unable to resize text in
a web
browser, leading to accessibility problems for those with visual
impairments.
To be avoided: Word2000_paste.htm
[file size: 10kb]
Created by copying text in Word and pasting into a blank
page in SharePoint Designer or FrontPage.
Pro: not much...
Con: files smaller than other Word options, but still twice as big as
necessary; this is because pasting still brings the junk code, with consequent
problems outlined above; SharePoint Designer or FrontPage's Remove Formatting option doesn't remove the
junk.
Word2000_from_doc.htm [file size: 5kb]
Created by inserting the contents of the Word document into a blank page in
SharePoint Designer or FrontPage (choose Insert File from the Insert menu).
Pro: easy when you know how; small files without the junk code; few
subsequent editing problems; HTML generated is not perfect, but not bad.
Con: it generates lots of <FONT> tags, which can cause subsequent
formatting problems.
Word2000_paste_special.htm [file
size: 4kb]
Created by copying text in Word and inserting into a blank page in
SharePoint Designer or FrontPage using paste special to paste unformatted text (in
the SharePoint Designer Edit menu, choose Paste Text, then select Plain Text; in the
FrontPage Edit
menu, choose Paste As, then select "Normal paragraphs" or "Normal
paragraphs with linebreaks").
Pro: small file; clean HTML; no subsequent editing problems.
Con: all formatting removed, so it took five minutes to re-format -
particularly slow for long and complex documents; can introduce errors in
re-applying the formatting.
Word2000_hand.htm [file size: 4kb]
Coded all HTML by hand in a text editor.
Pro: good code which can be validated to show that it meets standards (HTML 4.0
Transitional in this case); may look plain but easy to apply formatting; clean
code makes it easy to apply formatting with stylesheets (stylesheets can be
tricky to write, but once created are an effective way of applying standard
formatting quickly, as in this example of the same
page formatted with a stylesheet).
Con: slow (ten minutes); requires HTML skills; can introduce errors.
Other formats
To be avoided: Word2000sample.doc
[file size: 26kb]
This is the original Word document from which all the other pages were
created. It includes some straightforward formatting, and a simple table. A Word
document can be made available on the web by pointing a hyperlink at the file
(as has been done here).
Pro: easy.
Con: can be infected with viruses; big files; need to make sure users
have the right version of Word.
Word2000sample.rtf [file size: 22kb]
Created in Word using Save As and then selecting "rich text format".
Pro: easy; will open in most word processors.
Con: can be infected with viruses associated with a template file
(although this is far less common than Word .doc viruses); big
files.
Word2000sample.pdf
[file size: 16kb]
Created using Adobe's Acrobat Distiller; Office 2007
comes with PDF-writing options already installed.
Pro: preserves formatting and layout in a format optimised for printing;
end user can be prevented from modifying their copy of the file.
Con: files can be very large (although it's possible to optimise them:
this example file originally came to 113kb before optimisation); need to make sure users
have the free PDF reading software; author may need to buy full Acrobat program to
create PDFs.
Recommended solutions
Whilst hand-coding of pages is still probably the best way to ensure
standards-compliant and problem-free web pages, in practice few of us still take
this approach. What I suggest is:
- Converting short pieces of text
Copy the text in Word and use paste special to paste unformatted text into a
web page open in SharePoint Designer or FrontPage.
- Longer documents
Consider paste special but if this is not practical insert the contents of
the Word document into a web page open in SharePoint Designer or FrontPage using the Insert File
option in the Insert menu.
- Making Word documents available on the web
Where you have a Word document which you want to make available on the web,
rather than converting it into a web page you should consider whether the document
really needs to be converted into HTML. Alternatives include:
- Make the Word document available in rich text format.
- Convert the file to Adobe's PDF format.