Section 8.10. Java Version Differences


8.10. Java Version Differences

As mentioned at the start of the chapter, this book primarily covers Java 1.5.0. However, as of this writing, Java 1.4.2 is still widely used, and Java 1.6 is waiting in the wings (in beta release, but likely to be officially released soon). With this in mind, I'll summarize the regex- related changes between Versions 1.4.2 and 1.5.0 (as of Update 7), and between 1.5.0 and the current "build 59g" second beta release of 1.6.

8.10.1. Differences Between 1.4.2 and 1.5.0

Aside from a collection of new methods, Java 1.5.0 has changed little from Java 1.4.2. Most of the new methods were added in support of the new concept of a matcher's region. In addition, Unicode support was upgraded and tweaked a bit. All the changes are detailed in the next two sections. [ ]

[ ] The undocumented recursive construct (?1) , unofficially available in Java 1.4.2, is no longer included as of Java 1.5.0. Its similar to the same construct in PCRE (˜ 476), but being undocumented in Java, it is relegated here to a footnote.

8.10.1.1. New methods in Java 1.5.0

All region-related matcher methods are missing from Java 1.4.2:

  • region

  • regionStart

  • regionEnd

  • useAnchoringBounds

  • hasAnchoringBounds

  • useTransparentBounds

  • hasTransparentBounds

These other matcher methods are also missing from Java 1.4.2:

  • toMatchResult

  • hitEnd

  • requireEnd

  • usePattern

  • toString

This static method is missing from Java 1.4.2:

  • Pattern.quote

8.10.2. Unicode-support differences between 1.4.2 and 1.5.0

The following Unicode-related changes took place between 1.4.2 and 1.5.0:

  • Unicode support was upgraded from Unicode Version 3.0.0 in Java 1.4.2 to Unicode Version 4.0.0 in Java 1.5.0. This change impacts a number of things, such as what characters are defined (Unicode Version 3.0.0, for example, has no code points beyond \uFFFF ) and what properties they have, as well as the definition of Unicode blocks.

  • The way in which block names are referenced via \p{‹} and \P{‹} was enhanced. (See Javas documentation for Character.UnicodeBlock for the list of blocks and their official names.)

    In Java 1.4.2, the rule was "remove spaces from the official block name and prefix In ." As such, block references looked like \p{InHangulJamo} and \p{InArabicPresentationForms-A} .

    Two additional block-name forms were added in 1.5.0. One prefixes ' In ' to the official block name, so that you can use names like \p{InHangul Jamo} and \p{InArabic Presentation Forms-A} . The other allows you to prefix 'In' to the Java identifier for the block (which is a version of the official name with spaces and hyphens replaced by underscores): \p{InHangul_Jamo} and \p{InArabic_Presentation_Forms_A} .

  • Java 1.5.0 has fixed an odd bug in 1.4.2 that required the Arabic Presentation Forms-B and Latin Extended-B blocks to be referenced as if the ending "B" were actually "Bound," that is, as \p{InArabicPresentationForms-Bound} and \p{InLatinExtended-Bound} .

  • Java 1.5.0 added regex support for Java Character is Something methods (˜ 369).

8.10.3. Differences Between 1.5.0 and 1.6

The version of Java 1.6 in release as of this writing (the second beta) has only two minor regex-related changes from Java 1.5.0:

  • Previously missing support for the Pi and Pf Unicode categories has been included in Java 1.6.

  • The \Q‹\E construct has been fixed in Java 1.6 so that it works reliably even within a character class.



Mastering Regular Expressions
Mastering Regular Expressions
ISBN: 0596528124
EAN: 2147483647
Year: 2004
Pages: 113

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net