\search

Holds a collection to methods to manipulate various types of search index.

Summary

Methods
Properties
Constants
index()
tokenize()
strip_markup()
rebuild_invindex()
sort_index()
compare_indexes()
load_invindex()
measure_invindex_load_time()
merge_into_invindex()
delete_entry()
save_invindex()
query_invindex()
extract_context()
highlight_context()
$stop_words
No constants found
No protected methods found
No protected properties found
N/A
No private methods found
No private properties found
N/A

Properties

$stop_words

$stop_words : 

Words that we should exclude from the inverted index

Type

Methods

index()

index(string  $source) : array

Converts a source string into an index of search terms that can be merged into an inverted index.

Parameters

string $source

The source string to index.

Returns

array —

An index represents the specified string.

tokenize()

tokenize(string  $source) : array

Converts a source string into a series of raw tokens.

Parameters

string $source

The source string to process.

Returns

array —

An array of raw tokens extracted from the specified source string.

strip_markup()

strip_markup(string  $source) : string

Removes (most) markdown markup from the specified string.

Stripped strings are not suitable for indexing!

Parameters

string $source

The source string to process.

Returns

string —

The stripped string.

rebuild_invindex()

rebuild_invindex(boolean  $output = true) 

Rebuilds the master inverted index and clears the page id index.

Parameters

boolean $output

Whether to send progress information to the user's browser.

sort_index()

sort_index(array  $index) 

Sorts an index alphabetically. Will also sort an inverted index.

This allows us to do a binary search instead of a regular sequential search.

Parameters

array $index

The index to sort.

compare_indexes()

compare_indexes(array  $oldindex, array  $newindex, array  $changed, array  $removed) 

Compares two *regular* indexes to find the differences between them.

Parameters

array $oldindex

The old index.

array $newindex

The new index.

array $changed

An array to be filled with the nterms of all the changed entries.

array $removed

An array to be filled with the nterms of all the removed entries.

load_invindex()

load_invindex(string  $invindex_filename) 

Reads in and parses an inverted index.

Parameters

string $invindex_filename

The path tp the inverted index to parse.

measure_invindex_load_time()

measure_invindex_load_time(string  $invindex_filename) 

Reads in and parses an inverted index, measuring the time it takes to do so.

Parameters

string $invindex_filename

The path to the file inverted index to parse.

merge_into_invindex()

merge_into_invindex(array  $invindex, integer  $pageid, array  $index, array  $removals = array()) 

Merge an index into an inverted index.

Parameters

array $invindex

The inverted index to merge into.

integer $pageid

The id of the page to assign to the index that's being merged.

array $index

The regular index to merge.

array $removals

An array of index entries to remove from the inverted index. Useful for applying changes to an inverted index instead of deleting and remerging an entire page's index.

delete_entry()

delete_entry(  $invindex, \number  $pageid) 

Deletes the given pageid from the given pageindex.

Parameters

$invindex
\number $pageid

The pageid to remove.

save_invindex()

save_invindex(string  $filename, array  $invindex) 

Saves the given inverted index back to disk.

Parameters

string $filename

The path to the file to save the inverted index to.

array $invindex

The inverted index to save.

query_invindex()

query_invindex(string  $query, array  $invindex) : array

Searches the given inverted index for the specified search terms.

Parameters

string $query

The search query.

array $invindex

The inverted index to search.

Returns

array —

An array of matching pages.

extract_context()

extract_context(string  $query, string  $source) : string

Extracts a context string (in HTML) given a search query that could be displayed in a list of search results.

Parameters

string $query

The search queary to generate the context for.

string $source

The page source to extract the context from.

Returns

string —

The generated context string.

highlight_context()

highlight_context(string  $query, string  $context) : string

Highlights the keywords of a context string.

Parameters

string $query

The query to use when highlighting.

string $context

The context string to highlight.

Returns

string —

The highlighted (HTML) string.