logo

Steven Black Consulting

Contact me

Services
Consulting
Mentoring
Training

INTL Toolkit
Support
Info
INTL VFP
INTL 2.x
Price
Upgrade
Purchase

Also
Wikis
Downloads
Articles
Links

Hooks and Anchors Design Pattern, Example 1

MS Excel "Save As HTML" cleanup.

Given you've got Odious Crap HTML From MS Excel like this (view its source), how to cleanse it up so it's generic and clean like, say, this (view source) ?? 

Simple.  You set up an object society like the one illustrated below, whose life and execution is controlled by HooksConfig.DBF records,  listed in the table below the diagram.

Complete source code here: HooksAndAnchors.ZIP 

Given Garbage HTML, here's the hook society that cleans it...

 

 

....All created and invoked by this simple VFP code....

*-- This sample shows the cleansing of the
*-- odious HTML you get from an MS Excell "Save as HTML".
*--
*-- Environment
SET PROCEDURE TO HooksAndAnchors additive
SET PROCEDURE TO ParserHooks additive
LOCAL lcHTML, loChain
*-- Processing
lcHTML=FILETOSTR("crapfromexcel.htm")
loChain=CREATEOBJECT("HookAnchor","Excel Paste","Root")
loChain.Process( @lcHTML) && clean this HTML pig!
*-- Show the results
STRTOFILE(lcHTML, "CleanHTMLFromExcel.HTM") && Tada
SHELLEXEC("CleanHTMLFromExcel.HTM")

....which is orchestrated by these records in HooksConfig.DBF.

Cset Ctype Cclass Clibrary Lactive Seq Properties
Excel Paste ==================     .F. 0  
Excel Paste Root BodyContentsOnly ParserHooks.prg .T. 10  
Excel Paste Root StripContents ParserHooks.prg .T. 12 <![if,<![endif]>
Excel Paste Root StripContents ParserHooks.prg .T. 13 <!––,––>
Excel Paste Root MiscCharsRemove ParserHooks.prg .T. 14 x:
v:
Excel Paste Root HTMLTidy ParserHooks.prg .T. 20  
Excel Paste Root DOMAnchor HooksAndAnchors.prg .T. 50  
Excel Paste DOMAnchor KillNodesHook ParserHooks.prg .T. 50 cCollQuery=//table
cKill=col
Excel Paste DOMAnchor KillAttribsHook ParserHooks.prg .T. 100 cCollQuery=//table
cKill=bgcolor,width,cellpadding,class,str
Excel Paste DOMAnchor KillAttribsHook ParserHooks.prg .T. 200 cCollQuery=//td | //th
cKill=bgcolor,width,valign,class,height,num
Excel Paste DOMAnchor KillAttribsHook ParserHooks.prg .T. 300 cCollQuery=//tr
cKill=bgcolor,height,class