Bookmark

PHP Techniques for Removing HTML Markup from Strings

PHP Techniques for Removing HTML Markup from Strings

Using strip_tags()

The most straightforward way to remove HTML markup from a string in PHP is by using the strip_tags() function. This built-in function strips HTML and PHP tags from a string, leaving only the text content.

Syntax:

string strip_tags ( string $str [, string $allowable_tags ] )

$str: The input string containing HTML tags.

$allowable_tags: An optional parameter where we can specify tags you don't want to remove.

Example:

<?php
$html_content = "<p>This is a <strong>sample</strong> paragraph with <a href='#'>HTML</a> tags.</p>";
$plain_text = strip_tags($html_content);
echo $plain_text; // Output: "This is a sample paragraph with HTML tags."
?>

In this example, all HTML tags are stripped out, leaving only the plain text.

Allowing Specific Tags:

If you want to allow certain HTML tags, you can pass them as a second parameter:

<?php
$plain_text_with_links = strip_tags($html_content, '<a>');
echo $plain_text_with_links; // Output: "This is a sample paragraph with <a href='#'>HTML</a> tags."
?>

Using Regular Expressions with preg_replace()

For more complex scenarios, such as removing only specific tags or attributes, we can use regular expressions with preg_replace().

Example:

<?php
$html_content = "<div><p>This is a <span style='color:red'>sample</span> text with <a href='#'>link</a>.</p></div>";
$pattern = "/<[^>]*>/"; // Regular expression to match any HTML tag
$plain_text = preg_replace($pattern, '', $html_content);
echo $plain_text; // Output: "This is a sample text with link."
?>

This approach gives us more control, but writing and maintaining regular expressions can be challenging, especially for complex HTML.

Using htmlspecialchars_decode()

If your input is HTML-encoded, and we need to convert HTML entities back to their corresponding characters before stripping tags, you can use htmlspecialchars_decode().

Example:

<?php
$html_content = "&lt;p&gt;This is &lt;strong&gt;encoded&lt;/strong&gt; HTML&lt;/p&gt;";
$decoded_content = htmlspecialchars_decode($html_content);
$plain_text = strip_tags($decoded_content);
echo $plain_text; // Output: "This is encoded HTML"
?>

Combining strip_tags() and html_entity_decode()

Content may contain HTML entities that need to be decoded. We can use html_entity_decode() in combination with strip_tags().

Example:

<?php
$html_content = "<p>This &amp; that are <strong>important</strong>.</p>";
$plain_text = strip_tags(html_entity_decode($html_content));
echo $plain_text; // Output: "This & that are important."
?>
Post a Comment

Post a Comment