Adjust Regex To Ignore Anything Else Inside Link Html Tags
Solution 1:
Always Use DOM Parsing instead of regex
This has been suggested a multitude of times. And based on the comments to the increasingly complicated regex forming, it would be easier to examine just DOM. Take the following for example:
functionfragmentFromString(strHTML) {
returndocument.createRange().createContextualFragment(strHTML);
}
let html = `<a data-popup-text="take me to <a href='http://www.google.com'>a search engine</a>" href="testing.html" data-id="1" data-popup-text="take me to <a href='http://www.google.com'>a search engine</a>"><p>Testing <span>This</span></p></a>`;
let fragment = fragmentFromString(html);
let aTags = Array.from(fragment.querySelectorAll('a'));
aTags = aTags.map(a => {
return {
href: a.href,
text: a.textContent
}
});
console.log(aTags);
The above will turn a string of HTML into actual DOM inside of a fragment. You still still need to append that fragment somewhere, but the point is, that you can now query the a tags. The above code gives you an array of objects that contain the data for each a
tag, their href value, and the innerText, minus all the html.
Original answer. Don't use it, it stays to serve as context to the real problem:
I changed this a little to use a non-greedy format (.*?). It will also avoid early ending because of ending html in an attribute as pointed out by @Gaby aka G. Petrioli.
<.*?href="(.*?)"(?:[^"]*")+>(.*)<\/a>
Post a Comment for "Adjust Regex To Ignore Anything Else Inside Link Html Tags"