Hack 23. Straighten Smart Quotes

 < Day Day Up > 

Convert curly quotes, apostrophes, and other fancy typographical symbols back to their ASCII equivalents.

Have you ever gone to copy a block of text from a web site and paste it into a text editor (or try to paste it into a weblog post of your own)? The text comes through, but all the apostrophes and quote marks end up as random-looking symbols. The web site uses fancy publishing software to produce smart quotes and apostrophes, but your text editor doesn't understand them. This hack dumbs down these fancy typographical symbols to their ASCII equivalents.

3.4.1. The Code

This user script runs on all pages. It constructs an array of fancy characters (by their Unicode representation). Then, it gets a list of all the text nodes on the page and executes a search-and-replace on each node to convert each fancy character to a plain-text equivalent.

Learn more about Unicode at http://www.unicode.org.


In JavaScript, the replace method takes a regular expression object as its first parameter. For performance reasons, we build all our regular expressions first, and then reuse them every time through the loop. If we had used the inline regular expression syntax, Firefox would need to rebuild each regular expression object every time through the loop a significant performance drain on large pages!

Save the following user script as dumbquotes.user.js:

 // ==UserScript== // @name DumbQuotes // @namespace http://diveintomark.org/projects/greasemonkey/ // @description straighten curly quotes and apostrophes // @include * // ==/UserScript== var arReplacements = { "\xa0": " ", "\xa9": "(c)", "\xae": "(r)", "\xb7": "*", "\u2018": "'", "\u2019": "'", "\u201c": '"', "\u201d": '"', "\u2026": "…", "\u2002": " ", "\u2003": " ", "\u2009": " ", "\u2013": "-", "\u2014": "--", "\u2122": "(tm)"};    var arRegex = new Array( );    for (var sKey in arReplacements) {    arRegex[sKey] = new RegExp(sKey, 'g');    }    var snapTextNodes = document.evaluate("//text( )[" +    "not(ancestor::script) and not(ancestor::style)]",    document, null, XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE, null);    for (var i = snapTextNodes.snapshotLength - 1; i >= 0; i--) {   var elmTextNode = snapTextNodes.snapshotItem(i);   var sText = elmTextNode.data;   for (var sKey in arReplacements) {   sText = sText.replace(arRegex[sKey], arReplacements[sKey]);   }   elmTextNode.data = sText;   } 

3.4.2. Running the Hack

Before installing the user script, go to http://www.alistapart.com/articles/emen/. As shown in Figure 3-6, the fourth paragraph reads "But the larger problem is, now that they're available, almost no one publishing on the web today knows how to use them or often even knows of their existence." There are two fancy characters here: the apostrophe in the word they're and the dash between them and or.

Figure 3-6. Web page with fancy topography


Now, install the user script (Tools Install This User Script) and refresh the page at http://www.alistapart.com/articles/emen/. As shown in Figure 3-7, the two fancy characters have been replaced with their ASCII equivalents. The apostrophe has been converted to a straight apostrophe, and the dash has been replaced with two hyphen characters.

Figure 3-7. Web page with plain topography


Although this hack currently focuses on typographical symbols, there is nothing typography-specific about it. It's just a generic script that does global search-and-replace on the text of a web page. By altering the arReplacements array, you can replace any character, word, or phrase with anything else, on any web page. Obviously, this can lead to all sorts of mischief, if you were so inclined. I will leave this one up to your imagination….

     < Day Day Up > 


    Greasemonkey Hacks
    Greasemonkey Hacks: Tips & Tools for Remixing the Web with Firefox
    ISBN: 0596101651
    EAN: 2147483647
    Year: 2005
    Pages: 168
    Authors: Mark Pilgrim

    flylib.com © 2008-2017.
    If you may any questions please contact us: flylib@qtcs.net