0 of 1 found this page helpful

HTML-to-XML Conversion Examples

    The following tests show how SgmlReader converts malformed HTML into valid XML.  Note that extended characters may appear incorrectly since this page is generated on the fly using the HTML test file from GitHub.

    Test 1

    Before
    <html>
    <body><span text />
    </body>
    </html>
    
    After
    <html>
      <body>
        <span text="text" />
      </body>
    </html>
    

    Test 2

    Before
    <html>
    <body><span text="foo>bar"/>
    </body>
    </html>
    
    After
    <html>
      <body>
        <span text="foo">bar"/&gt;
    </span>
      </body>
    </html>
    

    Test 3

    Before
    <html>
    <body><span text="foo<bar"/>
    </body>
    </html>
    
    After
    <html>
      <body>
        <span text="foo&lt;bar" />
      </body>
    </html>
    

    Test 4

    Before
    <html>
    <body>
    <tag>&test&nbsp&nbsp blah blah</tag>
    </body>
    </html>
    
    After
    <html>
      <body>
        <tag>&amp;test  blah blah</tag>
      </body>
    </html>
    

    Test 5

    Before
    <html>
    <body>
    <tag>&nbsp&nbsp&nbsp blah blah</tag>
    </body>
    </html>
    
    After
    <html>
      <body>
        <tag>   blah blah</tag>
      </body>
    </html>
    

    Test 6

    Before
    <html>
    <body>
    <p>bad char: <span>&#1048576;</span></p>
    </body>
    </html>
    
    After
    <html>
      <body>
        <p>bad char: <span>��</span></p>
      </body>
    </html>
    

    Test 7

    Before
    <html>
    <body>
    <P class=MsoNormal dir=ltr 
    style="MARGIN: 0pt;" align=left><?xml:namespace 
    prefix = st1 ns = "urn:schemas-microsoft-com:office:smarttags" 
    /><ST1:PERSONNAME></ST1:PERSONNAME></P>
    </body>
    </html>
    
    After
    <html>
      <body>
        <P class="MsoNormal" dir="ltr" style="MARGIN: 0pt;" align="left">
          <?namespace 
    prefix = st1 ns = "urn:schemas-microsoft-com:office:smarttags" 
    ?>
          <ST1:PERSONNAME xmlns:ST1="#unknown">
          </ST1:PERSONNAME>
        </P>
      </body>
    </html>
    

    Test 8

    Before
    <html>
    <body>
    <DIV STYLE="top:214px; left:139px; position:absolute; font-size:26px;"><NOBR><SPAN STYLE="font-family:"Wingdings 2";"></SPAN></NOBR></DIV>
    </body>
    </html>
    
    After
    <html>
      <body>
        <DIV STYLE="top:214px; left:139px; position:absolute; font-size:26px;">
          <NOBR>
            <SPAN STYLE="font-family:" Wingdings="Wingdings" _x0032_=";">
            </SPAN>
          </NOBR>
        </DIV>
      </body>
    </html>
    

    Test 9

    Before
    <html>
    <body>
    <script type="text/javascript">/*<![CDATA[*/
    var test = '<div>"test"</div>';
    /*]]>*/</script>
    <p>test</p>
    </body>
    </html>
    
    After
    <html>
      <body>
        <script type="text/javascript"><![CDATA[
    var test = '<div>"test"</div>';
    ]]></script>
        <p>test</p>
      </body>
    </html>
    

    Test 10

    Before
    <html>
    <body>This <P>is bad </P> XHTML.</body>
    </html>
    
    After
    <html>
      <body>This <p>is bad </p> XHTML.</body>
    </html>
    

    Test 11

    Before
    <html>
    <body><span>some text</span> <span>more text</span></body>
    </html>
    
    After
    <html>
    <body><span>some text</span> <span>more text</span></body>
    </html>
    

    Test 12

    Before
    <html>
    <body><a href="http://www.cnn.com/"' title="cnn.com">cnn</a></body>
    </html>
    
    After
    <html>
      <body>
        <a href="http://www.cnn.com/">cnn</a>
      </body>
    </html>
    

    Test 13

    Before
    <html>
    <head>
    <style>
    <!--
    </style>
    </head>
    </html>
    
    After
    <html>
      <head>
        <style>
          <!--
    </style>
    </head>
    </html>
    -->
        </style>
      </head>
    </html>
    

    Test 14

    Before
    <html>
      <body>&apos;</body>
    </html>
    
    After
    <html>
      <body>'</body>
    </html>
    

    Test 15

    Before
    <script type="text/javascript></script>
    
    After
    <html>
      <script type="text/javascript">
      </script>
    </html>
    

    Test 16

    Before
    <html xmlns="http://www.w3.org/1999/xhtml"><head /><body><table u1:str="" x:str=""></table></body></html>
    
    After
    <html xmlns="http://www.w3.org/1999/xhtml">
      <head />
      <body>
        <table u1:str="" x:str="" xmlns:x="#unknown1" xmlns:u1="#unknown">
        </table>
      </body>
    </html>
    

    Test 17

    Before
    <html>
        <body>&sup2;</body>
    </html>
    
    After
    <html>
      <body>²</body>
    </html>
    

    Test 18

    Before
    <html>
        <body>
           <something@something.com>
        </body>
    </html>
    
    After
    <html>
      <body>&lt;something@something.com&gt;</body>
    </html>
    

    Test 19

    Before
    <html>
        <body>
            <script type="text/javascript">/*<![CDATA[*/ /*<![CDATA[*/ test /*]]>*/ /*]]&gt;*/</script>
        </body>
    </html>
    
    After
    <html>
      <body>
        <script type="text/javascript"><![CDATA[  test  /*]]&gt;*/]]></script>
      </body>
    </html>
    

    Test 20

    Before
    <html>
    	<body>
    		<style>div.wiki { float: right; }</style>
    		<em>foo</em>
    	</body>
    </html>
    
    After
    <html>
      <body>
        <style><![CDATA[div.wiki { float: right; }]]></style>
        <em>foo</em>
      </body>
    </html>
    

    Test 21

    Before
    <html><body><title>Title</title><foo>foo</foo></body></html>
    
    After
    <html>
      <body>
        <title>Title</title>
        <foo>foo</foo>
      </body>
    </html>
    

    Test 22

    Before
    <html><body>
    <p class="MsoNormal">
    	<span style="font-size: 10pt;" arial="" ,="" sans-serif="" ;;="" font-family:dummy:="" font-family:="" font-family:foo:="" arial;="" font-size:="" 13.3333px;="">
    		<span class="Apple-style-span" style="font-family: Arial; font-size: 13.3333px;">-lm</span>
    	</span>
    </p>
    </body></html>
    
    After
    <html>
      <body>
        <p class="MsoNormal">
          <span style="font-size: 10pt;" arial="" sans-serif="">
            <span class="Apple-style-span" style="font-family: Arial; font-size: 13.3333px;">-lm</span>
          </span>
        </p>
      </body>
    </html>
    

    Test 23

    Before
    <html><body>do <![if !supportLists]>not<![endif]> lose this text</body></html>
    
    After
    <html>
      <body>do not lose this text</body>
    </html>
    

    Test 24

    Before
    <html xmlns="http://implicit" xmlns:n="http://explicit"><foo attr1="1" n:attr2="2" /><n:foo attr1="1" n:attr2="2" /></html>
    
    After
    <html xmlns="http://implicit" xmlns:n="http://explicit">
      <foo attr1="1" n:attr2="2" />
      <n:foo attr1="1" n:attr2="2" />
    </html>
    

    Test 25

    Before
    <html xmlns:n="http://explicit"><foo attr1="1" n:attr2="2" /><n:foo attr1="1" n:attr2="2" /></html>
    
    After
    <html xmlns:n="http://explicit">
      <foo attr1="1" n:attr2="2" />
      <n:foo attr1="1" n:attr2="2" />
    </html>
    

    Test 26

    Before
    <html xmlns:n="http://explicit"><foo attr1="1" n:attr2="2" /><n:foo attr1="1" n:attr2="2" /></html>
    
    After
    <html xmlns:n="http://explicit">
      <foo attr1="1" n:attr2="2" />
      <n:foo attr1="1" n:attr2="2" />
    </html>
    

    Test 27

    Before
    <html><foo xmlns:n="http://explicit" attr1="1" n:attr2="2" /></html>
    
    After
    <html>
      <foo xmlns:n="http://explicit" attr1="1" n:attr2="2" />
    </html>
    

    Test 28

    Before
    <html><foo xmlns:n="http://explicit" attr1="1" n:attr2="2" /></html>
    
    After
    <html>
      <foo xmlns:n="http://explicit" attr1="1" n:attr2="2" />
    </html>
    

    Test 29

    Before
    <html xmlns:o="http://microsoft.com"><body>A<o:p></o:p>B<o:p></o:p></body></html>
    
    After
    <html xmlns:o="http://microsoft.com">
      <body>A<o:p></o:p>B<o:p></o:p></body>
    </html>
    

    Test 30

    Before
    <html xmlns:o="http://microsoft.com"><body>A<o:p></o:p>B<o:p></o:p></body></html>
    
    After
    <html xmlns:o="http://microsoft.com">
      <body>A<o:p />B<o:p /></body>
    </html>
    

    Test 31

    Before
    <html><body>A<o:p></o:p>B<o:p></o:p></body></html>
    
    After
    <html>
      <body>A<o:p xmlns:o="#unknown"></o:p>B<o:p xmlns:o="#unknown"></o:p></body>
    </html>
    

    Test 32

    Before
    <html><body>A<o:p></o:p>B<o:p></o:p></body></html>
    
    After
    <html>
      <body>A<o:p xmlns:o="#unknown" />B<o:p xmlns:o="#unknown" /></body>
    </html>
    

    Test 33

    Before
    <html><body>
    
    After
    <html>
      <body>
      </body>
    </html>
    

    Test 34

    Before
    
    <html>
    
    After
    
    
    <html>
    </html>
    

    Test 35

    Before
    <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> 
    <html>
    
    After
    <html>
    </html>
    

    Test 36

    Before
    <html>
    <body>
    <table><tr><td>row1<tr><td>row2</td>
    
    After
    <html>
      <body>
        <table>
          <tr>
            <td>row1</td>
          </tr>
          <tr>
            <td>row2</td>
          </tr>
        </table>
      </body>
    </html> 
    

    Test 37

    Before
    <html> 
    <head> 
    <script language="JavaScript"> 
    <!-- 
    --></script> 
    </head> 
    <body> 
    <p>hello</p> 
    </body> 
    </html> 
    
    After
    <html>
      <head>
        <script language="JavaScript">
          <!-- 
    -->
        </script>
      </head>
      <body>
        <p>hello</p>
      </body>
    </html>
    

    Test 38

    Before
    <html>
    <![CDATA[this is a CDATA block with markup <table><tr><td> ]]>
    </html>
    
    After
    <html><![CDATA[this is a CDATA block with markup <table><tr><td> ]]></html>
    

    Test 39

    Before
    <p>This is really <messed_up.< p>.
    
    After
    <html>
      <p>This is really <messed_up.>&lt; p&gt;.
    </messed_up.></p>
    </html>
    

    Test 40

    Before
    <html><class="black">Text………</html>
    
    After
    <html>
      <class>Text………</class>
    </html>
    

    Test 41

    Before
    <p>&copy;</p>
    <br/>
    
    After
    <html>
      <p>©</p>
      <br />
    </html>
    

    Test 42

    Before
    <html> 
      <img src="img.gif" height"4" width= 2 > 
    </html>
    
    After
    <html>
      <img src="img.gif" height="4" width="2" />
    </html>
    

    Test 43

    Before
    <html>
      <script><![CDATA[this is a test]]></script>
    </html>
    
    After
    <html>
      <script><![CDATA[this is a test]]></script>
    </html>
    

    Test 44

    Before
    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" >
    <HTML></HTML>
    
    After
    <html>
    </html>
    

    Test 45

    Before
    <b>foo</b>
    
    After
    <html>
      <b>foo</b>
    </html>
    

    Test 46

    Before
    blah <b>foo</b>
    
    After
    <html>blah <b>foo</b></html>
    

    Test 47

    Before
    <!-- top --> <b>foo</b>
    
    After
    <!-- top -->
    <html>
      <b>foo</b>
    </html>
    

    Test 48

    Before
    <html>
    <body>
    <p>&#x5a;&#90;&#90 test &#90</p>
    
    After
    <html>
      <body>
        <p>ZZZ test Z</p>
      </body>
    </html>
    

    Test 49

    Before
    <html>
      <?xml version="1.0" encoding="UTF-16"?>
    </html>
    
    After
    <html>
    </html>
    

    Test 50

    Before
    <html><?xml:namespace prefix="st1" ns="urn:schemas-microsoft-com:office:smarttags" />
    <body>
    
    After
    <html>
      <?namespace prefix="st1" ns="urn:schemas-microsoft-com:office:smarttags" ?>
      <body>
      </body>
    </html>
    

    Test 51

    Before
    <html xmlns:portal="http://schemas.microsoft.com/msn/portal/controls"><head><title>Welcome to MSN.com</title>
    
    After
    <html xmlns:portal="http://schemas.microsoft.com/msn/portal/controls">
      <head>
        <title>Welcome to MSN.com</title>
      </head>
    </html>
    

    Test 52

    Before
    <html xmlns:portal="http://schemas.microsoft.com/msn/portal/controls"><head><title>Welcome to MSN.com</title>
    
    After
    <html xmlns:portal="http://schemas.microsoft.com/msn/portal/controls">
      <head>
        <title>Welcome to MSN.com</title>
      </head>
    </html>
    

    Was this page helpful?
    Tag page

    Files 1

    FileVersionSizeModified 
    Viewing 4 of 4 comments: view all
    It seems like the examples have gone, but it would be useful if they reappeared :)

    - Regin
    Posted 03:58, 13 May 2011
    @kvakulo the examples are dynamically rendered from the repo. we forgot to update the link when we moved the code to github. thanks for pointing it out!
    Posted 05:46, 13 May 2011
    @SteveB thanks for the quick fix!
    Posted 05:48, 13 May 2011
    Hi Team, the examples are gone again. Could you bring them back?

    Thanks
    Posted 16:36, 24 Jun 2011
    Viewing 4 of 4 comments: view all
    You must login to post a comment.

    Copyright © 2011 MindTouch, Inc. Powered by