Skip to content Skip to sidebar Skip to footer

Adjust Regex To Ignore Anything Else Inside Link Html Tags

So I have this regex: (.*)<\/a> So far I have been able to get it to match HTML link tags that have extra attributes in them. Like classes an

Solution 1:

Always Use DOM Parsing instead of regex

This has been suggested a multitude of times. And based on the comments to the increasingly complicated regex forming, it would be easier to examine just DOM. Take the following for example:

functionfragmentFromString(strHTML) {
  returndocument.createRange().createContextualFragment(strHTML);
}

let html = `<a data-popup-text="take me to <a href='http://www.google.com'>a search engine</a>" href="testing.html" data-id="1" data-popup-text="take me to <a href='http://www.google.com'>a search engine</a>"><p>Testing <span>This</span></p></a>`;
let fragment = fragmentFromString(html);
let aTags = Array.from(fragment.querySelectorAll('a'));

aTags = aTags.map(a => {
  return {
    href: a.href,
    text: a.textContent
  }
});

console.log(aTags);

The above will turn a string of HTML into actual DOM inside of a fragment. You still still need to append that fragment somewhere, but the point is, that you can now query the a tags. The above code gives you an array of objects that contain the data for each a tag, their href value, and the innerText, minus all the html.


Original answer. Don't use it, it stays to serve as context to the real problem:

I changed this a little to use a non-greedy format (.*?). It will also avoid early ending because of ending html in an attribute as pointed out by @Gaby aka G. Petrioli.

<.*?href="(.*?)"(?:[^"]*")+>(.*)<\/a>

Check out the JS fiddle

Post a Comment for "Adjust Regex To Ignore Anything Else Inside Link Html Tags"