 |
|
|
|
|
|
Detagger - Convert from HTML to Text
HTML Markup Removal Tool
Starmount Recommended Software!
This detagger software was written by JafSoft Ltd and is an extremely powerful utility which allows removal of some, or all, of the HTML tags from your HTML files. It is a perfect tool for extracting text from your web pages or to tidy up your HTML code in order to make your web pages cleaner and faster loading.
As an HTML to Text converter, Detagger allows you to convert HTML newsletters into a more compact and email-friendly format, helping authors easily maintain HTML and text versions. The program will output the document as text, preserving the marked up headings, lists, tables of the original document and turning them into suitable text formats. Text will be laid out as faithfully as possible to the original document, within the constraints of your chosen page width.
There are many formatting options which can be saved in "policy" files so that they may be easily reloaded in later sessions.
Detagger allows you to:-
- Remove HTML tags from the pages of your web site, using the heading, paragraph and list tags etc. on each page to decide how the text should be formatted
- Parse tables and layout the text accordingly. Simple tables can also be converted into comma-delimited (CSV) or tab-delimited data, ready for import into spreadsheets.
- Replace HTML hyperlinks by the display text. URLs may either be placed in the main text, or added as an entry in a reference table added at the end of the text.
- Format the output to your desired page width (may not work when parsing complex tables). This will often mean changing the layout slightly from that seen in the HTML.
- Format any "dialogue" intelligently. This is particularly useful when converting short stories written in HTML to text.
- Replace Image tags by an Image marker. This can be labelled with the Image URL or the ALT attribute text.
- Add custom header and footers to the text output. These can be merged in data fields such as convert date, title etc. The evaluation version, adds a standard header, in the registered version this is omitted and you can choose to add your own headers.
- Changing all HTML entities into the correct text characters. You can choose to have 8-bit characters replaced by 7-bit alternatives where available to give greatest compatibility of the output.
- Support the creation of Unicode text files from advanced HTML character sets.
Using Detagger to remove markup and manipulate html tags
As a markup remover, Detagger acts as a parser that allows you to "tidy up" your HTML code in a number of ways. You simply select classes of HTML tags you want to remove, sections of code you want stripped out, or tag manipulations you want performed. If detagger is used to strip out all HTML markup tags, then it will simply convert the HTML to text.
Parser options include:-
- remove all non-HTML tags (e.g. the extra MS Office tags added by Word)
- remove all non-standard tags
- remove the <HEAD>...</HEAD> section
- remove all <STYLE> tags, style sheets and CSS attributes
- remove all <SCRIPT> and JavaScript from the document
- remove all <FORM>,<INPUT>,<SELECT> etc tags
- remove all <FONT> tags
- remove all comment tags
- remove all hyperlinks (replacing them by the display text only)
Download: Detagger html to text - Free 30-day trial.
Other software in JafSoft's range is listed on our support page.
|
|