Skip to main content

Class: SentenceSplitter

SentenceSplitter is our default text splitter that supports splitting into sentences, paragraphs, or fixed length chunks with overlap.

One of the advantages of SentenceSplitter is that even in the fixed length chunks it will try to keep sentences together.

Constructors

constructor

new SentenceSplitter(options?): SentenceSplitter

Parameters

NameType
options?Object
options.chunkOverlap?number
options.chunkSize?number
options.chunkingTokenizerFn?(text: string) => string[]
options.paragraphSeparator?string
options.splitLongSentences?boolean
options.tokenizer?any
options.tokenizerDecoder?any

Returns

SentenceSplitter

Defined in

packages/core/src/TextSplitter.ts:78

Properties

chunkOverlap

chunkOverlap: number

Defined in

packages/core/src/TextSplitter.ts:70


chunkSize

chunkSize: number

Defined in

packages/core/src/TextSplitter.ts:69


chunkingTokenizerFn

Private chunkingTokenizerFn: (text: string) => string[]

Type declaration

▸ (text): string[]

Parameters
NameType
textstring
Returns

string[]

Defined in

packages/core/src/TextSplitter.ts:75


paragraphSeparator

Private paragraphSeparator: string

Defined in

packages/core/src/TextSplitter.ts:74


splitLongSentences

Private splitLongSentences: boolean

Defined in

packages/core/src/TextSplitter.ts:76


tokenizer

Private tokenizer: any

Defined in

packages/core/src/TextSplitter.ts:72


tokenizerDecoder

Private tokenizerDecoder: any

Defined in

packages/core/src/TextSplitter.ts:73

Methods

combineTextSplits

combineTextSplits(newSentenceSplits, effectiveChunkSize): TextSplit[]

Parameters

NameType
newSentenceSplitsSplitRep[]
effectiveChunkSizenumber

Returns

TextSplit[]

Defined in

packages/core/src/TextSplitter.ts:215


getEffectiveChunkSize

getEffectiveChunkSize(extraInfoStr?): number

Parameters

NameType
extraInfoStr?string

Returns

number

Defined in

packages/core/src/TextSplitter.ts:114


getParagraphSplits

getParagraphSplits(text, effectiveChunkSize?): string[]

Parameters

NameType
textstring
effectiveChunkSize?number

Returns

string[]

Defined in

packages/core/src/TextSplitter.ts:131


getSentenceSplits

getSentenceSplits(text, effectiveChunkSize?): string[]

Parameters

NameType
textstring
effectiveChunkSize?number

Returns

string[]

Defined in

packages/core/src/TextSplitter.ts:157


processSentenceSplits

processSentenceSplits(sentenceSplits, effectiveChunkSize): SplitRep[]

Splits sentences into chunks if necessary.

This isn't great behavior because it can split down the middle of a word or in non-English split down the middle of a Unicode codepoint so the splitting is turned off by default. If you need it, please set the splitLongSentences option to true.

Parameters

NameType
sentenceSplitsstring[]
effectiveChunkSizenumber

Returns

SplitRep[]

Defined in

packages/core/src/TextSplitter.ts:186


splitText

splitText(text, extraInfoStr?): string[]

Parameters

NameType
textstring
extraInfoStr?string

Returns

string[]

Defined in

packages/core/src/TextSplitter.ts:309


splitTextWithOverlaps

splitTextWithOverlaps(text, extraInfoStr?): TextSplit[]

Parameters

NameType
textstring
extraInfoStr?string

Returns

TextSplit[]

Defined in

packages/core/src/TextSplitter.ts:281