Bean Sheet Data Formats

When performing operations on cells, every cell in a Bean Sheet document is thought of as containing a Java Bean object¹. Such an object can be the result of a formula evaluation or it could simply occupy the cell in question. However, from the standpoint of data entry and display, as well as certain export-type persistence models, there needs to be a way to present non-textual cell contents and vice versa -- parse textual input to produce correct Java Beans. Bean Sheet relies on standard Java text format mechanism as it applies to objects. A user can apply a format to a cell or column range, which then governs the way objects are formatted and parsed. Anytime, a format is present and the formatting or parsing operation fails, Bean Sheet silently fails over to the default way of formatting and parsing cell data -- using object's toString() method on display and setting the value to a String object representing textual input on data entry. There are 3 types of formats that can be applied to a cell or column range: date / time, number, and message.

Date / Time Format²

Date and Time Patterns

Date and time formats are specified by date and time pattern strings. Within date and time pattern strings, unquoted letters from 'A' to 'Z' and from 'a' to 'z' are interpreted as pattern letters representing the components of a date or time string. Text can be quoted using single quotes (') to avoid interpretation. "''" represents a single quote. All other characters are not interpreted; they're simply copied into the output string during formatting or matched against the input string during parsing.

The following pattern letters are defined (all other characters from 'A' to 'Z' and from 'a' to 'z' are reserved):

Letter	Date or Time Component	Presentation	Examples
`G`	Era designator	Text	`AD`
`y`	Year	Year	`1996`; `96`
`M`	Month in year	Month	`July`; `Jul`; `07`
`w`	Week in year	Number	`27`
`W`	Week in month	Number	`2`
`D`	Day in year	Number	`189`
`d`	Day in month	Number	`10`
`F`	Day of week in month	Number	`2`
`E`	Day in week	Text	`Tuesday`; `Tue`
`a`	Am/pm marker	Text	`PM`
`H`	Hour in day (0-23)	Number	`0`
`k`	Hour in day (1-24)	Number	`24`
`K`	Hour in am/pm (0-11)	Number	`0`
`h`	Hour in am/pm (1-12)	Number	`12`
`m`	Minute in hour	Number	`30`
`s`	Second in minute	Number	`55`
`S`	Millisecond	Number	`978`
`z`	Time zone	General time zone	`Pacific Standard Time`; `PST`; `GMT-08:00`
`Z`	Time zone	RFC 822 time zone	`-0800`

Pattern letters are usually repeated, as their number determines the exact presentation:

Text: For formatting, if the number of pattern letters is 4 or more, the full form is used; otherwise a short or abbreviated form is used if available. For parsing, both forms are accepted, independent of the number of pattern letters.
Number: For formatting, the number of pattern letters is the minimum number of digits, and shorter numbers are zero-padded to this amount. For parsing, the number of pattern letters is ignored unless it's needed to separate two adjacent fields.
Year: For formatting, if the number of pattern letters is 2, the year is truncated to 2 digits; otherwise it is interpreted as a number.

For parsing, if the number of pattern letters is more than 2, the year is interpreted literally, regardless of the number of digits. So using the pattern "MM/dd/yyyy", "01/11/12" parses to Jan 11, 12 A.D.

For parsing with the abbreviated year pattern ("y" or "yy"), SimpleDateFormat must interpret the abbreviated year relative to some century. It does this by adjusting dates to be within 80 years before and 20 years after the time the SimpleDateFormat instance is created. For example, using a pattern of "MM/dd/yy" and a SimpleDateFormat instance created on Jan 1, 1997, the string "01/11/12" would be interpreted as Jan 11, 2012 while the string "05/04/64" would be interpreted as May 4, 1964. During parsing, only strings consisting of exactly two digits, as defined by Character.isDigit(char), will be parsed into the default century. Any other numeric string, such as a one digit string, a three or more digit string, or a two digit string that isn't all digits (for example, "-1"), is interpreted literally. So "01/02/3" or "01/02/003" are parsed, using the same pattern, as Jan 2, 3 AD. Likewise, "01/02/-3" is parsed as Jan 2, 4 BC.
Month: If the number of pattern letters is 3 or more, the month is interpreted as text; otherwise, it is interpreted as a number.
General time zone: Time zones are interpreted as text if they have names. For time zones representing a GMT offset value, the following syntax is used:
```
     GMTOffsetTimeZone:
             GMT Sign Hours : Minutes
     Sign: one of
             + -
     Hours:
             Digit
             Digit Digit
     Minutes:
             Digit Digit
     Digit: one of
             0 1 2 3 4 5 6 7 8 9
```
Hours must be between 0 and 23, and Minutes must be between 00 and 59. The format is locale independent and digits must be taken from the Basic Latin block of the Unicode standard.

For parsing, RFC 822 time zones are also accepted.
RFC 822 time zone: For formatting, the RFC 822 4-digit time zone format is used:
```
     RFC822TimeZone:
             Sign TwoDigitHours Minutes
     TwoDigitHours:
             Digit Digit
```
TwoDigitHours must be between 00 and 23. Other definitions are as for general time zones.

For parsing, general time zones are also accepted.

SimpleDateFormat also supports localized date and time pattern strings. In these strings, the pattern letters described above may be replaced with other, locale dependent, pattern letters. SimpleDateFormat does not deal with the localization of text other than the pattern letters; that's up to the client of the class.

Examples

The following examples show how date and time patterns are interpreted in the U.S. locale. The given date and time are 2001-07-04 12:08:56 local time in the U.S. Pacific Time time zone.

Date and Time Pattern	Result
`"yyyy.MM.dd G 'at' HH:mm:ss z"`	`2001.07.04 AD at 12:08:56 PDT`
`"EEE, MMM d, ''yy"`	`Wed, Jul 4, '01`
`"h:mm a"`	`12:08 PM`
`"hh 'o''clock' a, zzzz"`	`12 o'clock PM, Pacific Daylight Time`
`"K:mm a, z"`	`0:08 PM, PDT`
`"yyyyy.MMMMM.dd GGG hh:mm aaa"`	`02001.July.04 AD 12:08 PM`
`"EEE, d MMM yyyy HH:mm:ss Z"`	`Wed, 4 Jul 2001 12:08:56 -0700`
`"yyMMddHHmmssZ"`	`010704120856-0700`

Number Format³

Patterns

DecimalFormat patterns have the following syntax:

 Pattern:
         PositivePattern
         PositivePattern ; NegativePattern
 PositivePattern:
         Prefixopt Number Suffixopt
 NegativePattern:
         Prefixopt Number Suffixopt
 Prefix:
         any Unicode characters except \uFFFE, \uFFFF, and special characters
 Suffix:
         any Unicode characters except \uFFFE, \uFFFF, and special characters
 Number:
         Integer Exponentopt
         Integer . Fraction Exponentopt
 Integer:
         MinimumInteger
         #
         # Integer
         # , Integer
 MinimumInteger:
         0
         0 MinimumInteger
         0 , MinimumInteger
 Fraction:
         MinimumFractionopt OptionalFractionopt
 MinimumFraction:
         0 MinimumFractionopt
 OptionalFraction:
         # OptionalFractionopt
 Exponent:
         E MinimumExponent
 MinimumExponent:
         0 MinimumExponentopt

A DecimalFormat pattern contains a positive and negative subpattern, for example, "#,##0.00;(#,##0.00)". Each subpattern has a prefix, numeric part, and suffix. The negative subpattern is optional; if absent, then the positive subpattern prefixed with the localized minus sign (code>'-' in most locales) is used as the negative subpattern. That is, "0.00" alone is equivalent to "0.00;-0.00". If there is an explicit negative subpattern, it serves only to specify the negative prefix and suffix; the number of digits, minimal digits, and other characteristics are all the same as the positive pattern. That means that "#,##0.0#;(#)" produces precisely the same behavior as "#,##0.0#;(#,##0.0#)".

The prefixes, suffixes, and various symbols used for infinity, digits, thousands separators, decimal separators, etc. may be set to arbitrary values, and they will appear properly during formatting. However, care must be taken that the symbols and strings do not conflict, or parsing will be unreliable. For example, either the positive and negative prefixes or the suffixes must be distinct for DecimalFormat.parse() to be able to distinguish positive from negative values. (If they are identical, then DecimalFormat will behave as if no negative subpattern was specified.) Another example is that the decimal separator and thousands separator should be distinct characters, or parsing will be impossible.

The grouping separator is commonly used for thousands, but in some countries it separates ten-thousands. The grouping size is a constant number of digits between the grouping characters, such as 3 for 100,000,000 or 4 for 1,0000,0000. If you supply a pattern with multiple grouping characters, the interval between the last one and the end of the integer is the one that is used. So "#,##,###,####" == "######,####" == "##,####,####".

Special Pattern Characters

Many characters in a pattern are taken literally; they are matched during parsing and output unchanged during formatting. Special characters, on the other hand, stand for other characters, strings, or classes of characters. They must be quoted, unless noted otherwise, if they are to appear in the prefix or suffix as literals.

The characters listed here are used in non-localized patterns. Localized patterns use the corresponding characters taken from this formatter's DecimalFormatSymbols object instead, and these characters lose their special status. Two exceptions are the currency sign and quote, which are not localized.

Symbol	Location	Localized?	Meaning
`0`	Number	Yes	Digit
`#`	Number	Yes	Digit, zero shows as absent
`.`	Number	Yes	Decimal separator or monetary decimal separator
`-`	Number	Yes	Minus sign
`,`	Number	Yes	Grouping separator
`E`	Number	Yes	Separates mantissa and exponent in scientific notation. Need not be quoted in prefix or suffix.
`;`	Subpattern boundary	Yes	Separates positive and negative subpatterns
`%`	Prefix or suffix	Yes	Multiply by 100 and show as percentage
`\u2030`	Prefix or suffix	Yes	Multiply by 1000 and show as per mille
`¤` (`\u00A4`)	Prefix or suffix	No	Currency sign, replaced by currency symbol. If doubled, replaced by international currency symbol. If present in a pattern, the monetary decimal separator is used instead of the decimal separator.
`'`	Prefix or suffix	No	Used to quote special characters in a prefix or suffix, for example, `"'#'#"` formats 123 to `"#123"`. To create a single quote itself, use two in a row: `"# o''clock"`.

Scientific Notation

Numbers in scientific notation are expressed as the product of a mantissa and a power of ten, for example, 1234 can be expressed as 1.234 x 10^3. The mantissa is often in the range 1.0 <= x < 10.0, but it need not be. DecimalFormat can be instructed to format and parse scientific notation only via a pattern; there is currently no factory method that creates a scientific notation format. In a pattern, the exponent character immediately followed by one or more digit characters indicates scientific notation. Example: "0.###E0" formats the number 1234 as "1.234E3".

The number of digit characters after the exponent character gives the minimum exponent digit count. There is no maximum. Negative exponents are formatted using the localized minus sign, not the prefix and suffix from the pattern. This allows patterns such as "0.###E0 m/s".
The minimum and maximum number of integer digits are interpreted together:
- If the maximum number of integer digits is greater than their minimum number and greater than 1, it forces the exponent to be a multiple of the maximum number of integer digits, and the minimum number of integer digits to be interpreted as 1. The most common use of this is to generate engineering notation, in which the exponent is a multiple of three, e.g., "##0.#####E0". Using this pattern, the number 12345 formats to "12.345E3", and 123456 formats to "123.456E3".
- Otherwise, the minimum number of integer digits is achieved by adjusting the exponent. Example: 0.00123 formatted with "00.###E0" yields "12.3E-4".
The number of significant digits in the mantissa is the sum of the minimum integer and maximum fraction digits, and is unaffected by the maximum integer digits. For example, 12345 formatted with "##0.##E0" is "12.3E3". To show all digits, set the significant digits count to zero. The number of significant digits does not affect parsing.
Exponential patterns may not contain grouping separators.

Rounding

DecimalFormat uses half-even rounding (see ROUND_HALF_EVEN) for formatting.

Digits

For formatting, DecimalFormat uses the ten consecutive characters starting with the localized zero digit defined in the DecimalFormatSymbols object as digits. For parsing, these digits as well as all Unicode decimal digits, as defined by Character.digit, are recognized.

Special Values

NaN is formatted as a single character, typically \uFFFD. This character is determined by the DecimalFormatSymbols object. This is the only value for which the prefixes and suffixes are not used.

Infinity is formatted as a single character, typically \u221E, with the positive or negative prefixes and suffixes applied. The infinity character is determined by the DecimalFormatSymbols object.

Negative zero ("-0") parses to Double(-0.0), unless isParseIntegerOnly() is true, in which case it parses to Long(0).

Message Format⁴

Patterns and Their Interpretation

MessageFormat uses patterns of the following form:

 MessageFormatPattern:
         String
         MessageFormatPattern FormatElement String

 FormatElement:
         { ArgumentIndex }
         { ArgumentIndex , FormatType }
         { ArgumentIndex , FormatType , FormatStyle }

 FormatType: one of 
         number date time choice

 FormatStyle:
         short
         medium
         long
         full
         integer
         currency
         percent
         SubformatPattern

 String:
         StringPartopt
         String StringPart

 StringPart:
         ''
         ' QuotedString '
         UnquotedString

 SubformatPattern:
         SubformatPatternPartopt
         SubformatPattern SubformatPatternPart

 SubFormatPatternPart:
         ' QuotedPattern '
         UnquotedPattern

Within a String, "''" represents a single quote. A QuotedString can contain arbitrary characters except single quotes; the surrounding single quotes are removed. An UnquotedString can contain arbitrary characters except single quotes and left curly brackets. Thus, a string that should result in the formatted message "'{0}'" can be written as "'''{'0}''" or "'''{0}'''".

Within a SubformatPattern, different rules apply. A QuotedPattern can contain arbitrary characters except single quotes; but the surrounding single quotes are not removed, so they may be interpreted by the subformat. For example, "{1,number,$'#',##}" will produce a number format with the pound-sign quoted, with a result such as: "$#31,45". An UnquotedPattern can contain arbitrary characters except single quotes, but curly braces within it must be balanced. For example, "ab {0} de" and "ab '}' de" are valid subformat patterns, but "ab {0'}' de" and "ab } de" are not.

The ArgumentIndex value is a non-negative integer written using the digits '0' through '9', and represents an index into the arguments array passed to the format methods or the result array returned by the parse methods.

The FormatType and FormatStyle values are used to create a Format instance for the format element. The following table shows how the values map to Format instances. Combinations not shown in the table are illegal. A SubformatPattern must be a valid pattern string for the Format subclass used.

Format Type	Format Style	Subformat Created
(none)	(none)	`null`
`number`	(none)	`NumberFormat.getInstance(getLocale())`
	`integer`	`NumberFormat.getIntegerInstance(getLocale())`
	`currency`	`NumberFormat.getCurrencyInstance(getLocale())`
	`percent`	`NumberFormat.getPercentInstance(getLocale())`
	SubformatPattern	`new DecimalFormat(subformatPattern, new DecimalFormatSymbols(getLocale()))`
`date`	(none)	`DateFormat.getDateInstance(DateFormat.DEFAULT, getLocale())`
	`short`	`DateFormat.getDateInstance(DateFormat.SHORT, getLocale())`
	`medium`	`DateFormat.getDateInstance(DateFormat.DEFAULT, getLocale())`
	`long`	`DateFormat.getDateInstance(DateFormat.LONG, getLocale())`
	`full`	`DateFormat.getDateInstance(DateFormat.FULL, getLocale())`
	SubformatPattern	`new SimpleDateFormat(subformatPattern, getLocale())`
`time`	(none)	`DateFormat.getTimeInstance(DateFormat.DEFAULT, getLocale())`
	`short`	`DateFormat.getTimeInstance(DateFormat.SHORT, getLocale())`
	`medium`	`DateFormat.getTimeInstance(DateFormat.DEFAULT, getLocale())`
	`long`	`DateFormat.getTimeInstance(DateFormat.LONG, getLocale())`
	`full`	`DateFormat.getTimeInstance(DateFormat.FULL, getLocale())`
	SubformatPattern	`new SimpleDateFormat(subformatPattern, getLocale())`
`choice`	SubformatPattern	`new ChoiceFormat(subformatPattern)`

Applying Formats To Existing Data

Formats are applied to cell and column ranges -- not to the objects they may contain. In the case where a set of cells contain textual data that needs to be parsed using a particular format, the following procedure is recommended:

Select appropriate cell or column range.
Select menu Table --> Column Format or Cell Format.
Specify desired format and pattern and press OK button.
If the format is being applied to a column range, select cells containing data that needs to be parsed. If there are gaps in the column range, select cells from consecutive column ranges and perform the following steps for each such selection.
Select menu Edit --> Copy.
Select menu Edit --> Paste.

1It has to be a Java Bean in order for most persistence mechanisms to work. Without worrying about persistence, any Java object could theoretically occupy a cell.

2 http://java.sun.com/j2se/1.4.2/docs/api/java/text/SimpleDateFormat.html

3 http://java.sun.com/j2se/1.4.2/docs/api/java/text/DecimalFormat.html

4 http://java.sun.com/j2se/1.4.2/docs/api/java/text/MessageFormat.html