Experiments into markov chains, n-grams, and text generation.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
MarkovGrams/README.md

59 lines
4.1 KiB

6 years ago
# MarkovGrams
Experiments into markov chains, n-grams, and text generation. This repository is the result of the markov chain blog post mini-series on my blog.
- [Part 1 - n-grams](https://starbeamrainbowlabs.com/blog/article.php?article=posts/236-Markov-Chain-Part-1-N-Grams.html)
- [Part 2 - unweighted markov chains](https://starbeamrainbowlabs.com/blog/article.php?article=posts%2F238-Markov-Chains-Part-2-Unweighted-Chains.html)
4 years ago
- [Markov Chains Part 3: Weighted Chains](https://starbeamrainbowlabs.com/blog/article.php?article=posts/285-Markov-Chains-Part-3-Weighted.html)
- [Markov Chains Part 4: Test Data](https://starbeamrainbowlabs.com/blog/article.php?article=posts/323-MarkovGrams-Part-4-Test-Data.html)
## Building
Building this project should be easy - I can provide prebuilt binaries upon request though if asked (see [my website](https://starbeamrainbowlabs.com/) for contact information).
### Windows
Open the solution in Visual Studio / MonoDevelop and hit build.
### Linux and everyone else
As above with MonoDevelop, or simply use the command `xbuild` whilst inside the root of the repository.
## Using
Detailed usage help can be seen by simply running the tool without any arguments:
```bash
./MarkovGrams.exe
```
Linux users might need to explicitly run it with `mono`:
```bash
mono ./MarkovGrams.exe
```
## Wordlists
The `wordlists/` directory contains a few interesting wordlists I used when writing and testing this program.
Filename | Contents | Comments
----------------|--------------|------------------
`Colours.txt` | List of colours | Built with a sneaky bit of Javascript in the developer tools of [this webpage](http://www.colorhexa.com/color-names) from [colorhexa](http://www.colorhexa.com/), which turns out to be a really useful website about colours.
4 years ago
`Science Words.txt` | List of cool sciencey-type words | Compiled from scratch by [Starbeamrainbowlabs](https://starbeamrainbowlabs.com/) - that's me! This list falls under the _Mozilla Public License 2.0_, as described below in the [license section](https://git.starbeamrainbowlabs.com/sbrl/MarkovGrams#license).
`Cross-Code-Items.txt` | List of items in _[Cross-Code](http://cross-code.com/)_ | A list of all items in the game _CrossCode_. Scraped from [here](https://crosscode.gamepedia.com/Items) by a clever bit of bash in `wordlists/download.sh`. I obviously don't own any of these names.
`Final-Fantasy-15-Items.txt` | List of items in _[Final Fantasy 15](https://finalfantasyxv.square-enix-games.com/)_ | Another list of in-game items - this time from _Final Fantasy XV_. Scraped from [this wiki page](http://finalfantasy.wikia.com/wiki/List_of_Final_Fantasy_XV_items) and related pages. The even cleverer bit of bash that does this is also in `wordlists/download.sh`. Again, I don't own any of these :-)
`No-Mans-Sky-Items.txt` | List of items in [No Man's Sky](https://www.nomanssky.com/) | Yep, you guessed it. Don't own this. Bash available in `wordlists/download.sh` - I had a bit of trouble with this one, and had to use an awkward hack or two.
`Starbound.txt` | List of blocks and items in [Starbound](https://playstarbound.com/) | From the [Official Wiki](https://starbounder.org/) - I don't own it, I just wrote the scraper :P
4 years ago
### Candidates
The following pages & websites looks like they show promise, but I haven't imported them yet.
- [List of fictional newspapers](https://en.wikipedia.org/wiki/List_of_fictional_newspapers)
## Credits
- The code was written by [Starbeamrainbowlabs](https://starbeamrainbowlabs.com/) - that's me too!
- I found the [Markov Chain Text Generation](http://nullprogram.com/blog/2012/09/05/) post by [Chris Wellons](http://nullprogram.com/) rather useful when writing this.
4 years ago
- Wordlists - See the table above
## License
This repository (except where stated above - the wordlists except the science words) is licensed under the _Mozilla Public License 2.0_ (_MPL-2.0_). A copy of the license text can be found in the [LICENSE file](https://git.starbeamrainbowlabs.com/sbrl/MarkovGrams/src/branch/master/LICENSE) in this repository, and tldr-legal have [a summary](https://tldrlegal.com/license/mozilla-public-license-2.0-(mpl-2)) if you can't speak legalese :-)