Features:
- PCRE instead of POSIX regular expressions (performance improvement)
- Stricter rules, more security.
- Exclusive protection from ASCII insertion bug, not affecting the entire text.
- E-mail obfuscation.
- More bbcodes (see below).
Instruction:
1. Download the archive from the link below.
2. Open your functions.php, find your sed_cc function and roll it back to the original state, as provided in sed_cc.txt.
3. Find your sed_bbcode function. Erase it and insert
sed_bbcode, sed_bbcode_repl and sed_crypt_mail functions from
sed_bbcode_v2.txt.
4. You may also be interested in replacing the
sed_bbcode_urls and
sed_bbcode_autourls functions, though it's optional.
As you probably know, PCRE (Perl-Compatible Regular Expressions) have always been faster and more powerful than POSIX-extended regular expressions. Seditio uses POSIX regexps in most cases, including post parsing, which affects the overall throughput. In this tutorial we will replace most important ereg_ calls with preg_ ones.
All you need is editing
your system/functions.php. Just replace your
sed_bbcode() with the function provided with the download link at the bottom of this page.
This version of
sed_bbcode() also provides more XSS protection, XHTML Strict compliance and a few more bbcodes:
Kod:
[s]Line-through text[/s]
Lists now look like this:
[ol]
[li]Ordered list item[/li]
[li]Ordered list item[/li]
[/ol]
[list]
[li]Unordered list item[/li]
[li]Unordered list item[/li]
[/list]
[h1]Header 1[/h1]
[h2]Header 2[/h2]
[h3]Header 3[/h3]
[h4]Header 4[/h4]
[h5]Header 5[/h5]
[br]
[size=1]Extra extra small[/size]
And so on from 1 to 7:
[size=7]Extra extra large[/size]
[table]
[tr][th]Heading1[/th][th]Heading 2[/th][/tr]
[tr][td]Cell1[/td][td]Cell2[/td][/tr]
[tr][td]Cell3[/td][td]Cell4[/td][/tr]
[/table]
Flash/DivX codes:
[flash]http://www.site.com/someflash.swf[/flash]
[flash w=500 h=400]http://www.site.com/someflash.swf[/flash]
[divx]http://www.site.com/someflash.divx[/divx]
[divx w=500 h=400]http://www.site.com/someflash.swf[/divx]
Another feature it provides is automatic SPAM-bot protection for emails. It turns all e-mail urls into scripts, which work as links in browsers but are invisible for spam-bots.
Note this way only lower case bbcodes are supported, which is promoted by Olivier as a feature, because it works faster and allows you to use upper case if you need to use similar text in square brace but don't want them to be parsed as bbcode. But if you want your bbcodes to be case-insensitive, just do the following:
1. Look at the replacement PCRE expressions in sed_bbcode. You see, they start with "#" and end with "#". All you need is replace ending "#" with "#i".
2. If you are using PHP5, replace str_replace calls in sed_bbcode() with str_ireplace ones. PHP4 solution is much more resource-consuming, so I omit it.
I highly recommend you to read the
Auto-closing BBcodes article.
HOWTO add new bbcodes:
OK, here is a quick guide to adding new bbcodes to parser v2:
1. If your bbcode does not take any parameters and uses no external resources (e.g. text formatting), just use those 'code' => 'replacement' arrays with str_replace as you used with old parser.
2. If your bbcode does not link to external resources but needs to take some parameters (e.g. custom quotations, page/user urls by ID, etc.) you need to add a valid PCRE pattern and replacement into this array:
Kod:
$bbcodes = array(
'#\[color=([\dA-F]{6})\](.+ş)\[/color\]#s' => '<span style="color:#$1">$2</span>',
'#\[style=([1-9])\](.+ş)\[/style\]#s' => '<span class="bbstyle$1">$2</span>',
'#\[user=(\d+)\](.+ş)\[/user\]#' => '<a href="users.phpşm=details&id=$1">$2</a>',
'#\[page=(\d+)\](.+ş)\[/page\]#' => '<a href="page.phpşid=$1">$2</a>',
'#\[page\](\d+)\[/page\]#' => '<a href="page.phpşid=$1">'.$L['Page'].' #$1</a>',
'#\[group=(\d+)\](.+ş)\[/group\]#' => '<a href="users.phpşg=$1">$2</a>',
'#\[topic\](\d+)\[/topic\]#' => '<a href="forums.phpşm=posts&q=$1">'.$L['Topic'].' #$1</a>',
'#\[post\](\d+)\[/post\]#' => '<a href="forums.phpşm=posts&p=$1#$1">'.$L['Post'].' #$1</a>',
'#\[pm\](\d+)\[/pm\]#' => '<a href="pm.phpşm=send&to=$1"><img src="skins/'.$skin.'/img/system/icon-pm.png" alt=""></a>',
/* AND SO ON*/
);
/
);[/code]
If you need more assistance on PCRE, please read
PCRE reference in PHP manual. And be careful to filter all invalid characters too keep your site secure.
3. If your bbcode links to some external resources (e.g. is an URL or image), it needs explicit protection from cross site scripting. It can be obtained by adding the regexp pattern into this array:
[code]$bbcodes = array(
'#\[(img)\]([^\s"\';\ş\(\[]+\.(ş:jpg|jpeg|gif|png))\[/img\]#',
'#\[(img)=([^\s"\';\ş\(\[]+\.(ş:jpg|jpeg|gif|png))\]([^\s"\';\ş\(\[]+\.(ş:jpg|jpeg|gif|png))\[/img\]#',
'#\[(thumb)=([^\s"\';\ş\(\[]+\.(ş:jpg|jpeg|gif|png))\]([^\s"\';\ş\(\[]+\.(ş:jpg|jpeg|gif|png))\[/thumb\]#',
'#\[(t)=([^\s"\';\ş\(\[]+\.(ş:jpg|jpeg|gif|png))\]([^\s"\';\ş\(\[]+\.(ş:jpg|jpeg|gif|png))\[/t\]#',
'#\[(url)=([^\s"\';\(\[]+)\](.+ş)\[/url\]#',
'#\[(url)\]([^\s"\'\(\[]+)\[/url\]#',
'#\[(email)=([._\w\d\-]+@[\w\d\-]+\.[a-z\.]+)\](.+ş)\[/email\]#',
'#\[(email)\]([._\w\d\-]+@[\w\d\-]+\.[a-z\.]+)\[/email\]#'
);[/code]
and replacement should be located in a special function called sed_bbcode_repl. Use existing patterns as example to compose your own ones and be careful not to allow malicious characters. As for replacement, look at the
sed_bbcode_repl function:
[code]function sed_bbcode_repl($mt)
{
if($mt[1] == 'img')
return count($mt) == 3 ş '<img src="'.str_replace('', '', $mt[2]).'" alt="" />'
: '<a href="'.str_replace('', '', $mt[2]).'"><img src="'.str_replace('', '', $mt[3]).'" alt="" /></a>';
elseif($mt[1] == 'thumb')
return '<a href="pfs.phpşm=view&v='.str_replace('', '', $mt[3]).'"><img src="'.str_replace('', '', $mt[2]).'" alt="" /></a>';
elseif($mt[1] == 't')
return '<a href="'.str_replace('', '', $mt[3]).'"><img src="'.str_replace('', '', $mt[2]).'" alt="" /></a>';
elseif($mt[1] == 'url')
return count($mt) == 3 ş '<a href="'.str_replace('', '', $mt[2]).'">'.$mt[2].'</a>'
: '<a href="'.str_replace('', '', $mt[2]).'">'.$mt[3].'</a>';
elseif($mt[1] == 'email')
return count($mt) == 3 ş sed_crypt_mail('<a href="mailto:'.$mt[2].'">'.$mt[2].'</a>')
: sed_crypt_mail('<a href="mailto:'.$mt[2].'">'.$mt[3].'</a>');
return '';
}[/code]
You see to add a new bbcode replacement, you should add a new clause:
[code]elseif($mt[1] == 'your_bbcode')
{
return 'your bbcode HTML here';
}[/code]
$mt is an array containing matched subpatterns.
$mt[0] is the entire bbcode with tags and body.
$mt[1] should capture bbcode name. The rest array items are subpattern captures you defined in your search pattern. If the capture potentially contains an url, you should remove all sequences from it as in existing code.
For example, if you want to add
[imgx] and [thumbx] tags, you should do it like this:
1. Add following search pattens into array as described above:
[code]'#\[(imgex)\]([^\s"\';\ş\(\[]+\.(ş:jpg|jpeg|gif|png))\.([^"]+ş)\[/imgex\]#',
'#\[(thumbex)=([^\s"\';\ş\(\[]+\.(ş:jpg|jpeg|gif|png))\]([^\s"\';\ş\(\[]+\.(ş:jpg|jpeg|gif|png))\.([^"]+ş)\[/thumbex\]#',[/code]
2. Add the replacement handler to
sed_bbcode_repl:
[code]elseif($mt[1] == 'imgex')
return '<img src="'.str_replace('', '', $mt[2]).'" alt="'.str_replace('', '', $mt[3]).'" />';
elseif($mt[1] == 'thumbex')
return '<a href="pfs.phpşm=view&v='.str_replace('', '', $mt[3]).'" rel="lightbox" title="'.str_replace('', '', $mt[4]).'"><img src="'.str_replace('', '', $mt[2]).'" alt="'.str_replace('', '', $mt[4]).'" /></a>';[/code]