jsChunkEx: xTalk Chunk Expressions in JavaScript

The jsChunkEx library is a JavaScript implementation of a feature of xTalk languages known as chunk expressions. Chunk expressions are used to manipulate text strings using natural-language concepts such as characters, words, items, sentences, lines, and paragraphs.

Click each button to see how a body of text is split into chunks below. Hover over each chunk to reveal its descriptor (type and index). Chunk expressions, originating from natural language, start at index 1.

All human beings are born free and equal in dignity and rights. They are endowed with reason and conscience and should act towards one another in a spirit of brotherhood.

Everyone is entitled to all the rights and freedoms set forth in this Declaration, without distinction of any kind, such as race, colour, sex, language, religion, political or other opinion, national or social origin, property, birth or other status. Furthermore, no distinction shall be made on the basis of the political, jurisdictional or international status of the country or territory to which a person belongs, whether it be independent, trust, non-self-governing or under any other limitation of sovereignty.

How To Use

To use jsChunkEx, embed the following script tag on your page:

<script type="text/javascript" src="http://www.kreativekorp.com/lib/jsChunkEx/jsChunkEx.js"></script>

Or, if you prefer to host it yourself, download jsChunkEx.zip.

Call jsChunkEx using any of the following APIs:

jsChunkEx.countChunks(text, descriptors, chunkType)
Count the number of chunks in a string.
jsChunkEx.splitChunks(text, descriptors, chunkType)
Split a string into an array of its constituent chunks.
jsChunkEx.findChunk(text, descriptors)
Determine the start and end offset of a chunk in a string.
jsChunkEx.findChunkToDelete(text, descriptors)
Determine the start offsets of a chunk and the following chunk.
jsChunkEx.getChunk(text, descriptors)
Return the substring corresponding to a chunk of a string.
jsChunkEx.deleteChunk(text, descriptors)
Return a string with the specified chunk removed.
jsChunkEx.replaceChunk(text, descriptors, replacement)
Return a string with a chunk replaced with another string.
jsChunkEx.prependToChunk(text, descriptors, replacement)
Return a string with a chunk prepended with another string.
jsChunkEx.appendToChunk(text, descriptors, replacement)
Return a string with a chunk appended with another string.

Where descriptors is a series of arguments of the following forms:

chunkType, index
a single chunk at the given index
chunkType, startIndex, endIndex
a range of chunks between two indices
chunkType, jsChunkEx.BY_CONTENT, stringToMatch
a chunk matching a given string
chunkType, jsChunkEx.BY_CONTENT, regExpToMatch
a chunk matching a given regular expression
jsChunkEx.LINE_ENDING, lineEnding
Change the line ending used by the jsChunkEx.LINE chunk type.
jsChunkEx.ITEM_DELIMITER, itemDelimiter
Change the delimiter used by the jsChunkEx.ITEM chunk type.
jsChunkEx.COLUMN_DELIMITER, columnDelimiter
Change the delimiter used by the jsChunkEx.COLUMN chunk type.
jsChunkEx.ROW_DELIMITER, rowDelimiter
Change the delimiter used by the jsChunkEx.ROW chunk type.

And chunkType is one of the following:

jsChunkEx.CHARACTER
A Unicode character.
jsChunkEx.WORD
Sequences of non-whitespace characters separated by whitespace.
jsChunkEx.ITEM
Sequences of characters delimited by the jsChunkEx.ITEM_DELIMITER.
jsChunkEx.SENTENCE
Sequences of characters ending with a period, exclamation point, or question mark.
jsChunkEx.LINE
Sequences of characters delimited by newline, carriage return, CRLF, or the Unicode line separator or paragraph separator character.
jsChunkEx.PARAGRAPH
Sequences of characters separated by newline, carriage return, or Unicode line separator or paragraph separator characters.
jsChunkEx.COLUMN
Sequences of characters delimited by the jsChunkEx.COLUMN_DELIMITER.
jsChunkEx.ROW
Sequences of characters delimited by the jsChunkEx.ROW_DELIMITER.

And index is a positive integer for a chunk counted from the beginning of a string, a negative integer for a chunk counted from the end of a string, or one of the following special values:

jsChunkEx.ANY
A random chunk.
jsChunkEx.FIRST
The first chunk.
jsChunkEx.MIDDLE
The middle chunk.
jsChunkEx.LAST
The last chunk.

Examples

jsChunkEx.countChunks('Hello, my name is Rebecca.', jsChunkEx.WORD)
returns 5
jsChunkEx.getChunk('Hello, my name is Rebecca.', jsChunkEx.WORD, 3)
returns "name"
jsChunkEx.countChunks('Hello, my name is Rebecca.', jsChunkEx.WORD, 3, jsChunkEx.CHARACTER)
returns 4
jsChunkEx.deleteChunk('Hello, my name is Rebecca.', jsChunkEx.WORD, 2, 4)
returns "Hello, Rebecca."
jsChunkEx.replaceChunk('Hello, my name is Rebecca.', jsChunkEx.WORD, jsChunkEx.LAST, 'Ginny.')
returns "Hello, my name is Ginny."

By all means feel free to play around with jsChunkEx on JSFiddle.