Skip to content Skip to sidebar Skip to footer

Is There An Easy Way To Convert HTML With Multiple
Tags Into Proper Surrounding

Tags In Javascript?

Let's say I have a bunch of HTML like below: bla bla bla long paragraph here

bla bla bla more paragraph text

Is there an easy way w

Solution 1:

I got bored. I'm sure there are optimizations / tweaks needed. Uses a little bit of jQuery to do its magic. Worked in FF3. And the answer to your question is that there isnt a very "simple" way :)

$(function() {
  $.fn.pmaker = function() {
    var brs = 0;
    var nodes = [];

    function makeP()
    {
      // only bother doing this if we have nodes to stick into a P
      if (nodes.length) {
        var p = $("<p/>");
        p.insertBefore(nodes[0]);  // insert a new P before the content
        p.append(nodes); // add the children        
        nodes = [];
      }
      brs=0;
    }

    this.contents().each(function() {    
      if (this.nodeType == 3) // text node 
      {
        // if the text has non whitespace - reset the BR counter
        if (/\S+/.test(this.data)) {
          nodes.push(this);
          brs = 0;
        }
      } else if (this.nodeType == 1) {
        if (/br/i.test(this.tagName)) {
          if (++brs == 2) {
            $(this).remove(); // remove this BR from the dom
            $(nodes.pop()).remove(); // delete the previous BR from the array and the DOM
            makeP();
          } else {
            nodes.push(this);
          }
        } else if (/^(?:p)$/i.test(this.tagName)) {
          // these tags for the P break but dont scan within
          makeP();
        } else if (/^(?:div)$/i.test(this.tagName)) {
          // force a P break and scan within
          makeP();
          $(this).pmaker();
        } else {
          brs = 0; // some other tag - reset brs.
          nodes.push(this); // add the node 
          // specific nodes to not peek inside of - inline tags
          if (!(/^(?:b|i|strong|em|span|u)$/i.test(this.tagName))) {
            $(this).pmaker(); // peek inside for P needs            
          }
        } 
      } 
    });
    while ((brs--)>0) { // remove any extra BR's at the end
      $(nodes.pop()).remove();
    }
    makeP();
    return this;
  };

  // run it against something:
  $(function(){ 
    $("#worker").pmaker();
  });

And this was the html portion I tested against:

<div id="worker">
bla bla bla long <b>paragraph</b> here
<br/>
<br/>
bla bla bla more paragraph text
<br/>
<br/>
this text should end up in a P
<div class='test'>
  and so should this
  <br/>
  <br/>
  and this<br/>without breaking at the single BR
</div>
and then we have the a "buggy" clause
<p>
  fear the real P!
</p>
and a trailing br<br/>
</div>

And the result:

<div id="worker"><p>
bla bla bla long <b>paragraph</b> here
</p>
<p>
bla bla bla more paragraph text
</p>
<p>
this text should end up in a P
</p><div class="test"><p>
  and so should this
  </p>
  <p>
  and this<br/>without breaking at the single BR
</p></div><p>
and then we have the a "buggy" clause
</p><p>
  fear the real P!
</p><p>
and a trailing br</p>
</div>

Solution 2:

Scan each of the child elements + text of the enclosing element. Each time you encounter a "br" element, create a "p" element, and append all pending stuff to it. Lather, rinse, repeat.

Don't forget to remove the stuff which you are relocating to a new "p" element.

I have found this library (prototype.js) to be useful for this sort of thing.


Solution 3:

I'm assuming you're not really allowing any other Sometimes you need to preserve single line-breaks (not all <br /> elements are bad), and you only want to turn double instances of <br /> into paragraph breaks.

In doing so I would:

  1. Remove all line breaks
  2. Wrap the whole lot in a paragraph
  3. Replace <br /><br /> with </p>\n<p>
  4. Lastly, remove any empty <p></p> elements that might have been generated

So the code could look something like:

var ConvertToParagraphs = function(text) {
    var lineBreaksRemoved = text.replace(/\n/g, "");
    var wrappedInParagraphs = "<p>" + lineBreaksRemoved + "</p>";
    var brsRemoved = wrappedInParagraphs.replace(/<br[^>]*>[\s]*<br[^>]*>/gi, "</p>\n<p>");
    var emptyParagraphsRemoved = brsRemoved.replace(/<p><\/p>/g, "");
    return emptyParagraphsRemoved;
}

Note: I've been exceedingly verbose to show the processes, you'd simplify it of course.

This turns your sample:

bla bla bla long paragraph here
<br/>
<br/>
bla bla bla more paragraph text
<br/>
<br/>

Into:

<p>bla bla bla long paragraph here</p>
<p>bla bla bla more paragraph text</p>

But it does so without removing any <br /> elements that you may actually want.


Solution 4:

I'd do it in several stages:

  1. RegExp: Convert all br-tags to line-breaks.
  2. RegExp: Strip out all the white-space.
  3. RegExp: Convert the multiple line-breaks to single ones.
  4. Use Array.split('\n') on the result.

That should give an array with all the 'real' paragraphs (in theory.) Then you can just iterate through it and wrap each line in p-tags.


Post a Comment for "Is There An Easy Way To Convert HTML With Multiple
Tags Into Proper Surrounding

Tags In Javascript?"