5.0.0 first release (#154)
* Started working on brandon checker 2.0, added more docs * Starting on cipher iface from #104 * Starting on cipher iface from #104 * More registry work * More registry work * Nicer registry * Nicer registry * More iface * GCSE geography WAS useful * Refactored logging, and got module loading to work * added skeleton of object iface * Quick patch of registry * Fixed docs * Holy sheet that's cool! * Fixing old python * Logging and testing refactors * Wow I hate this stupid reference bs * Can now do iterated encodings * Morse optimisation * More migration work of modules into the refactoring, and added list regex * Added comment * Uploaded tests * Can't get Jupyer to work so I'm doing it in raw Python now * Started fleshing out the builder * Updated todo list * Figured out hansard.txt has spelling errors will try to apply SpaCy model to automatically fix errors * updates * Added new dicts to gitignore * Started combining the 2 dicts Former-commit-id: 0733c4951dbff373748f4c0c2c3fb2a4d224fd32 * More cleaning of the dicts Former-commit-id: a1290e2bb67ce7a8bc9626fbc7e294806b1e6913 * More work on fixing up the combined dict Former-commit-id: 733cdb1635d5cd3f93f52e6afd10efee0f58b50e * Text is almost made Former-commit-id: 5e6b2e8e4b230494f082d8f1c4fe1061b8f31ace * Adding more comments Former-commit-id: 62ae46be03b2d4e113e4552a97a7e903591b387d * Working on an automated test Former-commit-id: 38a28368efb1ebed7cf7a4e8fe6906a59dddc3be * More tests Former-commit-id: 0f677a31397c9a358ff4e5d487e35571319f2877 * Switcing to I3 Former-commit-id: 47d7112655be06b42f8a8c7c2b6fdfc0e561d8a9 * More testing stuff Former-commit-id: 5179e8cc8a3bf6554c13a1667bea8b449a547205 * Fixing tests Former-commit-id: 7e5fefd473f21bad4a21d0e87e9905e63d5fb50c * Turing the file into a proper class Former-commit-id: 69058f7768e95d20e33c7bf1e08de362d227bb4e * Working on automating more of the tester Former-commit-id: c88803c15f1b4ee1dbc5f5a963bc5856a1fcf96f * Fixed major bug Former-commit-id: a4affd37f6fc03567f0fa077ac5ebdd04f7ab7ae * Added targeting * Adding more tests Former-commit-id: 8dcc8788fb1d85f45a50f4943551075969a79f82 * Optimised word endings Former-commit-id: 0a309423d9278745efeb3c71f06d6a2612e0a213 * switching brancghes Former-commit-id: b334b950e5c48d5582febb929e7503602ab4a020 * harlan pls Former-commit-id: cea325a58b0405417c7909d10906b5e3626d8582 * Made it better Former-commit-id: 3dbc9e02e997b87424cd0bf890bf2edfd18ccdf0 * stopped using 20k tests Former-commit-id: 8b7ac4a5b19d5d6df8717e4f068ed853c35b85fb * started automating threshold calculations Former-commit-id: a9626bef1dc9c373122b9471ab3cdfe8f4d77a6c * Started writing more threshold calculations Former-commit-id: 2a5b844885d7f912e118eebba4b706a3a7db444d * Maybe this will work Former-commit-id: 20ab0efa3143a8be19178ee6ae111f2d68b2378f * Added important todo Former-commit-id: 1d14c87bd09ce6c4de8364bdaecd3920b5a46648 * Started building the NATO checker Former-commit-id: 2c4ac906a7fac7bbe799cde9f289632e68df0a1a * Updated settings file Former-commit-id: 776143f1714325faac5d75a6dea41e27affcf677 * Added sent size calculator Former-commit-id: 71f715b87d21efccd2dddd9ecb85b24e765d0af2 * This works Former-commit-id: 88a82f12879fd8e102657e4e92eddedb48affe3f * Bug fixes Former-commit-id: 0445ea0a60829221aea4f531d47ea7af2719b3f1 * This finally works LMAOOOOO yayyyy Former-commit-id: edc0371251a5ff305c91bbf07feb898402d49965 * Done tests Former-commit-id: 9734ee69dfc799287532f16d82d050732b7744f7 * Wrote up more docs Former-commit-id: 350380e54bb7dae7c0cb4f26c6a6d8f2d499af57 * Started creating new brandon iface Former-commit-id: 055f031cb1ffb279f268e210b2aaa46d8cf1262f * Building settings Former-commit-id: 754a9a5498fd8fac823dcdca28743b0f7ff8345a * Writing documentation Former-commit-id: 63f3e18dbdcf2d1296dfe39b83aedfff6d09da11 * Started integrating the settings file Former-commit-id: 108e165f4fcf8f823c205310fd659321f77f6389 * Docs Former-commit-id: b4875c8a9cc656576031708e625113357e5212dc * Moving to tf lite * More settings file stuff Former-commit-id: 164c0ea9b47fe7c182665ed5dc9072fa3f543cc6 * Added more docs Former-commit-id: cdae88f4b718422282f97aef527ece54292279a6 * More settings stuff Former-commit-id: 85b547134692bfe91af0dac5eaf9f4e639d15174 * More updates Former-commit-id: ff521d1d86938d62307427f5237966a6c907f3a9 * Added regexFile Former-commit-id: 75f25febbc969359ef7e08a2b1da8cda13f81423 * Adding more comments to settings Former-commit-id: 54dad6eb30682b9877ef3bb9e21bd2da790172dc * Added where argument User can now easily find appdirs Former-commit-id: 10d047e1e91f1202c662b2313ca4dc587a0c6475 * Wrote more docs Former-commit-id: 183f2271a40b9f0a37158ee8a97fc096c13fa24a * Started writing stopwords checker Former-commit-id: 95090a42a4c0650a1879ee65405482e3918fcaec * Stopwords theoretically works and is efficient Former-commit-id: 87150256c5243abddf86e27bad3fd9b4752abbf0 * Changed 1k words Former-commit-id: b7316f0b2eb0e57becbe61406cb0b0a1a1c7cade * Refactors Former-commit-id: ef3049b08a2a34e34f3e45dca244064e1308cc51 * Updates to docs Former-commit-id: 2ab679d44cf566e4e350daaf4b4d8d3940006604 * End of night Former-commit-id: 5dacd8bfbe91b185bf0d685056ef511d12c47148 * Last push Former-commit-id: 55a3cb96f26cf19ff4bcceee288075205e4799e5 * Made resources easier to obtain * Updates * Writing more Brandon Checker * Testing for best threshold of dict checker * Docs * Documentation + tests * More brandon checker stuff * More documentation * Made Windows Package Manager Manifest * Changelog additions * More changelog * Changelog * What * the iface is bad im sorry * Added * Fixed ciphey * Whoopsie * here u go broken code * less broken * It constructs * Adding windows and mac os testing * Removed automated tests * More git actions * yess * actions * Testing actions some more (sorry for the spam) * Nox * pls dont hate me * more CI * Added Pyinstaller settings file + the entry point needed * Updated Pyinstaller * Added packaging stuff * goodnight * Readme changes * More README updates * Fixing CI * README * One more * Switching branches * Updated readme to look at the pretty gif * readme update * Switching branches * Added quorum * Moved detectors to be only for encodings * Added discord group to contributing.md * Added discord links * ciphey-iface, now featuring ActuallyWorksNow (TM) technology * Now works even better * Updated readme * Some more optimisations * Added greppable back for *cowards* * README changes * Canonicalised handleDecodings * switchinbg branches * Made merge_dict * Whoops * README * changes * changes * pls github * REASME updates * READNE * README updates * README updates R * README updates * more updates * pls * Added reverse * Added initial csv work * I love python typing * README * BASE64 -> Base64. Complete -> compete * Update README.md * Update README.md * Some more tweaks * removed important.md * Update README.md * I broke main, Harlan said he will take a look tomorrow * Added Installation guide to README * Added CI to README * Reworked the README, I changed a lot :) * ReadTheDocs -> Docs.Ciphey.Online * Added new logo * Increased size of logo * Uploaded all lock pictures * Added thanks to designer * Update README.md * Update README.md * I am an idiot * Fixed registry bs * Added runs to steps in github action * Replaced github action with one from GitHub @ * fixing poetry and nox issues * fixing nox * Added clickspinner to indicate running programing * Added AppDirs command to Main Now user can find out where ciphey expectst he settings file to be * Update README.md * Added important links section * Update README.md * Added installation guide to anothjer place * Update README.md * Almost finished args * Update README.md * That should work now * begone, tensorflow * Started PKGBuild & A* * Updated README with new downloads, more A* work * Update README.md * Turned important links into a table * more a star stuff * Before remake v1 * A* node selection * A* finally works * Added comments to allow for easy modification * switching branches * Join brandon in * psuedocode for astar * fixing brandon checker * pls harlan fix# * Fixed brandon * That should be trace * fixed what to use bug * Various prodding * Working brandon * First RC Co-authored-by: Brandon <brandonskerritt51@gmail.com> Co-authored-by: Brandon <10378052+brandonskerritt@users.noreply.github.com>
|
@ -0,0 +1,2 @@
|
|||
todo:
|
||||
keyword: "@TODO"
|
|
@ -0,0 +1,38 @@
|
|||
# Configuration for Lock Threads - https://github.com/dessant/lock-threads-app
|
||||
|
||||
# Number of days of inactivity before a closed issue or pull request is locked
|
||||
daysUntilLock: 60
|
||||
|
||||
# Skip issues and pull requests created before a given timestamp. Timestamp must
|
||||
# follow ISO 8601 (`YYYY-MM-DD`). Set to `false` to disable
|
||||
skipCreatedBefore: false
|
||||
|
||||
# Issues and pull requests with these labels will be ignored. Set to `[]` to disable
|
||||
exemptLabels: []
|
||||
|
||||
# Label to add before locking, such as `outdated`. Set to `false` to disable
|
||||
lockLabel: false
|
||||
|
||||
# Comment to post before locking. Set to `false` to disable
|
||||
lockComment: >
|
||||
This thread has been automatically locked since there has not been
|
||||
any recent activity after it was closed. Please open a new issue for
|
||||
related bugs.
|
||||
|
||||
# Assign `resolved` as the reason for locking. Set to `false` to disable
|
||||
setLockReason: true
|
||||
|
||||
# Limit to only `issues` or `pulls`
|
||||
# only: issues
|
||||
|
||||
# Optionally, specify configuration settings just for `issues` or `pulls`
|
||||
# issues:
|
||||
# exemptLabels:
|
||||
# - help-wanted
|
||||
# lockLabel: outdated
|
||||
|
||||
# pulls:
|
||||
# daysUntilLock: 30
|
||||
|
||||
# Repository to extend settings from
|
||||
# _extends: repo
|
|
@ -0,0 +1,37 @@
|
|||
name: coverage
|
||||
on: [push, pull_request]
|
||||
jobs:
|
||||
build:
|
||||
runs-on: ${{ matrix.os }}
|
||||
strategy:
|
||||
matrix:
|
||||
os: [ubuntu-latest, macos-latest, windows-latest]
|
||||
python-version: [3.6, 3.7, pypy3]
|
||||
# exclude:
|
||||
# - os: macos-latest
|
||||
# python-version: 3.8
|
||||
# - os: windows-latest
|
||||
# python-version: 3.6
|
||||
steps:
|
||||
- uses: actions/checkout@v2
|
||||
- name: Set up Python
|
||||
uses: actions/setup-python@v2
|
||||
with:
|
||||
python-version: ${{ matrix.python-version }}
|
||||
- name: Tests with Nox
|
||||
|
||||
coverage:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v2
|
||||
- uses: actions/setup-python@v2
|
||||
with:
|
||||
python-version: '3.8'
|
||||
architecture: x64
|
||||
- run: pip3 install nox==2019.11.9
|
||||
- run: pip3 install poetry==1.0.5
|
||||
- run: nox --sessions tests coverage
|
||||
env:
|
||||
CODECOV_TOKEN: ${{secrets.CODECOV_TOKEN}}
|
||||
|
||||
|
|
@ -1,31 +0,0 @@
|
|||
name: Tests
|
||||
on: [push, pull_request]
|
||||
jobs:
|
||||
old-tests:
|
||||
runs-on: ubuntu-latest
|
||||
strategy:
|
||||
matrix:
|
||||
python-version: ['3.6','3.7']
|
||||
name: Python ${{ matrix.python-version }}
|
||||
steps:
|
||||
- uses: actions/checkout@v2
|
||||
- uses: actions/setup-python@v1
|
||||
with:
|
||||
python-version: ${{ matrix.python-version }}
|
||||
architecture: x64
|
||||
- run: pip install nox==2019.11.9
|
||||
- run: pip install poetry==1.0.5
|
||||
- run: nox
|
||||
coverage:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v2
|
||||
- uses: actions/setup-python@v1
|
||||
with:
|
||||
python-version: '3.8'
|
||||
architecture: x64
|
||||
- run: pip3 install nox==2019.11.9
|
||||
- run: pip3 install poetry==1.0.5
|
||||
- run: nox --sessions tests coverage
|
||||
env:
|
||||
CODECOV_TOKEN: ${{secrets.CODECOV_TOKEN}}
|
|
@ -0,0 +1,19 @@
|
|||
name: Tests
|
||||
on: [push, pull_request]
|
||||
jobs:
|
||||
tests:
|
||||
runs-on: ubuntu-latest
|
||||
strategy:
|
||||
matrix:
|
||||
python-version: ['3.7', '3.8']
|
||||
os: [ubuntu-latest, macos-latest, windows-latest]
|
||||
name: Python ${{ matrix.python-version }}
|
||||
steps:
|
||||
- uses: actions/checkout@v2
|
||||
- uses: actions/setup-python@v1
|
||||
with:
|
||||
python-version: ${{ matrix.python-version }}
|
||||
architecture: x64
|
||||
- run: pip install nox==2019.11.9
|
||||
- run: pip install poetry==1.0.5
|
||||
- run: nox
|
|
@ -76,7 +76,7 @@ MANIFEST
|
|||
# Usually these files are written by a python script from a template
|
||||
# before PyInstaller builds the exe, so as to inject date/other infos into it.
|
||||
*.manifest
|
||||
*.spec
|
||||
|
||||
|
||||
# Installer logs
|
||||
pip-log.txt
|
||||
|
@ -194,6 +194,29 @@ poetry.lock
|
|||
|
||||
# PyCharm
|
||||
.idea/
|
||||
|
||||
ciphey/LanguageChecker/create\?max_size=60&spelling=US&spelling=GBs&max_variant=2&diacritic=both&special=hacker&special=roman-numerals&download=wordlist&encoding=utf-8&format=inline
|
||||
|
||||
ciphey/LanguageChecker/create\?max_size=80&spelling=US&spelling=GBs&spelling=GBz&spelling=CA&spelling=AU&max_variant=2&diacritic=both&special=hacker&special=roman-numerals&download=wordlist&encoding=utf-8&format=inline
|
||||
|
||||
ciphey/LanguageChecker/aspell.txt
|
||||
dictionary.txt
|
||||
aspell.txt
|
||||
|
||||
ciphey.spec
|
||||
|
||||
ciphey/__main__.spec
|
||||
|
||||
__main__.spec
|
||||
|
||||
.entry_point.spec/entry_point.spec
|
||||
|
||||
BEANOS INVADES THE FORTNITE ITEM SHOP AT 8_00 PM.EXE-uG0WJcr-cuI.f299.mp4.part
|
||||
|
||||
run.yml
|
||||
|
||||
tests/interface.rst
|
||||
|
||||
|
||||
# Test Generator
|
||||
test_main_generated.py
|
||||
|
|
|
@ -1,3 +1,48 @@
|
|||
Howdy!
|
||||
|
||||
So, you're interested in contributing to Ciphey? 🤔
|
||||
|
||||
But maybe you're confused as to where to start, or you believe your coding skills aren't "good enough". Well, for the latter - that's ridiculous! We're perfectly okay with "bad code", and even then, if you're reading this document you're probably a great programmer. I mean, newbies don't often learn to contribute to GitHub projects 😉
|
||||
|
||||
Here are some ways you can contribute to Ciphey:
|
||||
* Add a new language 🧏
|
||||
* Add more encryption methods 📚
|
||||
* Create more documentation (very important‼️ We would be eternally grateful)
|
||||
* Fix bugs submitted via GitHub issues (we can support you in this 😊)
|
||||
* Refactor the code base 🥺
|
||||
|
||||
If these sound hard, do not worry! This document will walk you through exactly how to achieve any of these. And also.... Your name will be added to Ciphey's contributors list, and we'll be eternally grateful! 🙏
|
||||
|
||||
|
||||
We have a small Discord chat for you to talk to the developers and get some help. Alternatively, write a GitHub issue for your suggestion. If you want to be added to the Discord, DM us or ask us somehow.
|
||||
|
||||
[Discord Server](https://discord.gg/KfyRUWw)
|
||||
# Add a new language 🧏
|
||||
The default language checker, `brandon`, works with multiple languages. Now, this may sound daunting.
|
||||
But honestly, all you've got to do is take a dictionary, do a little analysis (we've written code to help you with this), add the dictionaries and analysis to a repo. And then add the option to `settings.yml`.
|
||||
|
||||
When I created the German module, I wrote detailed documentation on how I did it. You can read that here.
|
||||
|
||||
# Add more encryption methods 📚
|
||||
|
||||
# Create more documentation
|
||||
Documentation is the most important part of Ciphey. No documentation is extreme code debt, and we don't want that.
|
||||
|
||||
And trust me when I say, if you contribute to great documentation you will be seen on the same level as code contributors. Documentation is absolutely vital.
|
||||
|
||||
There's lots of ways you can add documentation.
|
||||
* Doc strings in the code
|
||||
* Improving our current documentation (README, this file, our Read The Docs pages)
|
||||
* Translating documentation
|
||||
|
||||
And much more!
|
||||
|
||||
# Fix Bugs
|
||||
Visit our GitHub issues page to find all the bugs Ciphey has! And squash them, you'll be added to the contributors list ;)
|
||||
|
||||
# Refacor the code base
|
||||
Not all of Ciphey follows PEP8, and some of the code is repeated.
|
||||
|
||||
# How to contribute
|
||||
Ciphey is always in need of more decryption tools!
|
||||
1. Write a decryption tool (this can include encodings such as Base64 too). Make sure it has a `decrypt` function and is a class.
|
||||
|
|
|
@ -0,0 +1,50 @@
|
|||
# This is an example PKGBUILD file. Use this as a start to creating your own,
|
||||
# and remove these comments. For more information, see 'man PKGBUILD'.
|
||||
# NOTE: Please fill out the license field for your package! If it is unknown,
|
||||
# then please put 'unknown'.
|
||||
|
||||
# Maintainer: Ciphey <brandon@skerritt.blog>
|
||||
pkgname=Ciphey
|
||||
pkgver='4.2.1'
|
||||
pkgrel=1
|
||||
pkgdesc="Automated Description Tool"
|
||||
arch=('any')
|
||||
url="https://github.com/ciphey/ciphey"
|
||||
license=('MIT')
|
||||
depends=('python>=3.7')
|
||||
makedepends=('python>=3.7')
|
||||
/* checkdepends=() */
|
||||
/* optdepends=() */
|
||||
/* provides=() */
|
||||
/* conflicts=() */
|
||||
/* replaces=() */
|
||||
/* backup=('') */
|
||||
/* options=() */
|
||||
install=
|
||||
changelog=
|
||||
source=("$pkgname-$pkgver.tar.gz"
|
||||
"$pkgname-$pkgver.patch")
|
||||
noextract=()
|
||||
md5sums=()
|
||||
validpgpkeys=()
|
||||
|
||||
prepare() {
|
||||
cd "$pkgname-$pkgver"
|
||||
patch -p1 -i "$srcdir/$pkgname-$pkgver.patch"
|
||||
}
|
||||
|
||||
build() {
|
||||
cd "$pkgname-$pkgver"
|
||||
./configure --prefix=/usr
|
||||
make
|
||||
}
|
||||
|
||||
check() {
|
||||
cd "$pkgname-$pkgver"
|
||||
make -k check
|
||||
}
|
||||
|
||||
package() {
|
||||
cd "$pkgname-$pkgver"
|
||||
make DESTDIR="$pkgdir/" install
|
||||
}
|
After Width: | Height: | Size: 4.1 KiB |
After Width: | Height: | Size: 2.5 KiB |
Before Width: | Height: | Size: 8.1 KiB After Width: | Height: | Size: 5.2 KiB |
After Width: | Height: | Size: 110 KiB |
After Width: | Height: | Size: 23 KiB |
After Width: | Height: | Size: 4.7 KiB |
After Width: | Height: | Size: 25 KiB |
After Width: | Height: | Size: 2.0 MiB |
After Width: | Height: | Size: 7.3 KiB |
After Width: | Height: | Size: 756 B |
160
README.md
|
@ -1,61 +1,126 @@
|
|||
<p align="center"><a href="https://docs.ciphey.online">Documentation</a> <a href="https://discord.ciphey.online">Discord</a></p>
|
||||
|
||||
<p align="center">
|
||||
➡️
|
||||
<a href="https://docs.ciphey.online">Documentation</a> |
|
||||
<a href="https://discord.ciphey.online">Discord</a> |
|
||||
<a href="https://docs.ciphey.online/en/latest/install.html">Installation Guide</a>
|
||||
⬅️
|
||||
|
||||
<br>
|
||||
<img src="Pictures_for_README/binoculars.png" alt="Ciphey">
|
||||
</p>
|
||||
|
||||
|
||||
<p align="center">
|
||||
<img alt="GitHub commit activity" src="https://img.shields.io/github/commit-activity/m/ciphey/ciphey">
|
||||
<img src="https://pepy.tech/badge/ciphey">
|
||||
<img src="https://pepy.tech/badge/ciphey/month">
|
||||
<a href="https://discord.gg/wM3scnc"><img alt="Discord" src="https://img.shields.io/discord/728245678895136898"></a>
|
||||
<a href="https://pypi.org/project/ciphey/"><img src="https://img.shields.io/pypi/v/ciphey.svg"></a>
|
||||
<img src="https://img.shields.io/badge/License-MIT-yellow.svg" alt="Ciphey">
|
||||
|
||||
<img src="https://github.com/brandonskerritt/Ciphey/workflows/Python%20application/badge.svg?branch=master" alt="Ciphey">
|
||||
<img alt="PyPI - Downloads" src="https://img.shields.io/pypi/dm/ciphey">
|
||||
<img src="https://codecov.io/gh/ciphey/ciphey/branch/master/graph/badge.svg">
|
||||
<a href="https://ciphey.readthedocs.io/"><img src="https://readthedocs.org/projects/ciphey/badge/"></a>
|
||||
<img src="https://img.shields.io/badge/all_contributors-1-orange.svg?style=flat-square">
|
||||
<br>
|
||||
Fully automated decryption tool using natural language processing & artifical intelligence, along with some common sense.
|
||||
</p>
|
||||
<hr>
|
||||
|
||||
# What is this?
|
||||
## [Installation Guide](https://docs.ciphey.online/en/latest/install.html)
|
||||
|
||||
| <p align="center"><a href="https://pypi.org/project/ciphey">🐍 Python (Universal) </a></p> | <p align="center"><a href="https://pypi.org/project/ciphey"> Arch </a></p> | <p align="center"><a href="https://pypi.org/project/ciphey"> Windows </a></p> | <p align="center"><a href="https://pypi.org/project/ciphey"> Mac OS </a></p> |
|
||||
| ----------------------------------------------------------------------------------------- | -------------------------------------------------------------------------- | ----------------------------------------------------------------------------- | ---------------------------------------------------------------------------- |
|
||||
| <p align="center"><img src="Pictures_for_README/python.png" /></p> | <p align="center"><img src="Pictures_for_README/arch.png" /></p> | <p align="center"><img src="Pictures_for_README/windows.png" /></p> | <p align="center"><img src="Pictures_for_README/apple.png" /></p> |
|
||||
| `python3 -m pip install ciphey --upgrade` | `yay ciphey` | `winget ciphey` | `brew ciphey` |
|
||||
|
||||
| Linux | Mac OS | Windows |
|
||||
| ----------- | ------ | ----------- |
|
||||
| ![GitHub Workflow Status](https://img.shields.io/github/workflow/status/ciphey/ciphey/Python%20application?label=Linux) |![GitHub Workflow Status](https://img.shields.io/github/workflow/status/ciphey/ciphey/Python%20application?label=Mac%20OS) | ![GitHub Workflow Status](https://img.shields.io/github/workflow/status/ciphey/ciphey/Python%20application?label=Windows) |
|
||||
|
||||
|
||||
<hr>
|
||||
|
||||
# 🤔 What is this?
|
||||
Ciphey is an automated decryption tool. Input encrypted text, get the decrypted text back.
|
||||
> "What type of encryption?"
|
||||
|
||||
That's the point. You don't know, you just know it's possibly encrypted. Ciphey will figure it out for you.
|
||||
Ciphey uses a deep neural network with a simple filtration system to approximate what something is encrypted with. And then a custom-built, customisable natural languge processing Language Checker Interface, which can detect when the given text becomes plaintext.
|
||||
|
||||
Ciphey can solve most things in 3 seconds or less.
|
||||
|
||||
Ciphey can solve most things in about 2 seconds.
|
||||
<p align="center" href="https://asciinema.org/a/336257">
|
||||
<img src="Pictures_for_README/index.gif" alt="Ciphey demo">
|
||||
</p>
|
||||
|
||||
# Features
|
||||
**The technical part.** Ciphey uses a custom built artifical intelligence module (_AuSearch_) with a _Cipher Detection Interface_ to approximate what something is encrypted with. And then a custom-built, customisable natural languge processing _Language Checker Interface_, which can detect when the given text becomes plaintext.
|
||||
|
||||
- **20+ encryptions supported** such as hashes, encodings (binary, base64) and normal encryptions like Caesar cipher, Transposition and more. **[For the full list, click here](https://ciphey.readthedocs.io/en/latest/ciphers.html)**
|
||||
- **Deep neural network for targetting the right decryption** resulting in decryptions taking less than 3 seconds. If Ciphey cannot decrypt the text, Ciphey will use the neural network analysis to give you information on how to decrypt it yourself.
|
||||
And that's just the tip of the iceberg. For the full technical explanation, check out our [documentation](https://docs.ciphey.online/en/latest/howWork.html).
|
||||
|
||||
# ✨ Features
|
||||
|
||||
- **20+ encryptions supported** such as hashes, encodings (binary, base64) and normal encryptions like Caesar cipher, Transposition and more. **[For the full list, click here](https://docs.ciphey.online/en/latest/ciphers.html)**
|
||||
- **Custom Built Artificial Intelligence with Augmented Search (AuSearch) for answering the question "what encryption was used?"** Resulting in decryptions taking less than 3 seconds.
|
||||
- **Custom built natural language processing module** Ciphey can determine whether something is plaintext or not. It has an incredibly high accuracy, along with being fast.
|
||||
- **Multi Language Support** at present, only English.
|
||||
- **Supports hashes & encryptions** Which the alternatives such as CyberChef do not.
|
||||
- **Multi Language Support** at present, only German & English (with AU, UK, CAN, USA variants).
|
||||
- **Supports hashes & encryptions** Which the alternatives such as CyberChef Magic do not.
|
||||
- **[C++ core](https://github.com/Ciphey/CipheyCore)** Blazingly fast.
|
||||
|
||||
# 🔭 Ciphey vs CyberChef
|
||||
|
||||
# Getting Started
|
||||
## Installation
|
||||
### Pip
|
||||
```python3 -m pip install -U ciphey```
|
||||
## 🔁 Base64 Encoded 64 times
|
||||
|
||||
The -U is needed, as sometimes PyPi gets stuck on an older version.
|
||||
<table>
|
||||
<tr>
|
||||
<th>Name</th>
|
||||
<th>⚡ Ciphey ⚡ </th>
|
||||
<th>🐢 CyberChef 🐢</th>
|
||||
</tr>
|
||||
<tr>
|
||||
<th>Gif</th>
|
||||
<td><img src="Pictures_for_README/ciphey_vs_cyberchef.gif" alt="The guy she tells you not to worry about"></td>
|
||||
<td><img src="Pictures_for_README/not_dying.gif" alt="You"></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th>Time</th>
|
||||
<td>4 seconds</td>
|
||||
<td>6 seconds</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th>Setup</th>
|
||||
<td><ul><li>Set the regex param to "{"</li></ul></td>
|
||||
<td><ul><li>Set the regex param to "{"</li><li>You need to know how many times to recurse</li><li>You need to know it's Base64 all the way down</li><li>You need to load CyberChef (it's a bloated JS app)</li><li>Know enough about CyberChef to create this pipeline</li><li>Invert the match</li></ul></td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
```ciphey -t "encrypted text here"```
|
||||
To run ciphey.
|
||||
|
||||
### Cloning from GitHub
|
||||
<sub><b>Note</b> The gifs may load at different times, so one may appear significantly faster than another.</sub><br>
|
||||
<sub><b>A note on magic </b>CyberChef's most similar feature to Ciphey is Magic. Magic fails instantly on this input and crashes. The only way we could force CyberChef to compete was to manually define it.</sub>
|
||||
|
||||
```
|
||||
git clone https://github.com/Ciphey/Ciphey
|
||||
cd Ciphey
|
||||
python3 -m ciphey -t "encrypted text here"
|
||||
```
|
||||
### Running Ciphey
|
||||
|
||||
We also tested CyberChef and Ciphey with a **6gb file**. Ciphey cracked it in **5 minutes and 54 seconds**. CyberChef crashed before it even started.
|
||||
|
||||
|
||||
|
||||
## 📊 Ciphey vs Katana vs CyberChef Magic
|
||||
|
||||
| **Name** | ⚡ Ciphey ⚡ | 🤡 Katana 🤡 | 🐢 CyberChef Magic 🐢 |
|
||||
| ------------------------------------------ | ---------- | ---------- | ------------------- |
|
||||
| Advanced Language Checker | ✅ | ❌ | ✅ |
|
||||
| Supports Encryptions | ✅ | ✅ | ❌ |
|
||||
| Releases named after Dystopian themes 🌃 | ✅ | ❌ | ❌ |
|
||||
| Supports hashes | ✅ | ✅ | ❌ |
|
||||
| Easy to set up | ✅ | ❌ | ✅ |
|
||||
| Can guess what something is encrypted with | ✅ | ❌ | ❌ |
|
||||
| Created for hackers by hackers | ✅ | ✅ | ❌ |
|
||||
|
||||
# 🎬 Getting Started
|
||||
|
||||
If you're having trouble with installing Ciphey, [read this.](https://docs.ciphey.online/en/latest/install.html)
|
||||
|
||||
## ‼️ Important Links (Docs, Installation guide, Discord Support)
|
||||
|
||||
| Installation Guide | Documentation | Discord |
|
||||
| ------------------ | ------------- | ------- |
|
||||
| 📖 [Installation Guide](https://docs.ciphey.online/en/latest/install.html) | 📚 [Documentation](https://docs.ciphey.online) | 🦜 [Discord](https://discord.ciphey.online)
|
||||
|
||||
## Running Ciphey
|
||||
There are 3 ways to run Ciphey.
|
||||
1. File Input `ciphey - encrypted.txt`
|
||||
2. Unqualified input `ciphey -- "Encrypted input`
|
||||
|
@ -63,26 +128,35 @@ There are 3 ways to run Ciphey.
|
|||
|
||||
![Gif showing 3 ways to run Ciphey](Pictures_for_README/3ways.gif)
|
||||
|
||||
To get rid of the progress bars, probability table, and all the noise use the grep mode.
|
||||
To get rid of the progress bars, probability table, and all the noise use the quiet mode.
|
||||
|
||||
```ciphey -t "encrypted text here" -g```
|
||||
```ciphey -t "encrypted text here" -q```
|
||||
|
||||
For a full list of arguments, run `ciphey --help`.
|
||||
|
||||
### Importing Ciphey
|
||||
You can import Ciphey\'s __main__ and use it in your own programs and code.
|
||||
This is feature is expected to expand in the next major version.
|
||||
# Docs
|
||||
The docs are located at [https://ciphey.readthedocs.io/en/latest/](https://ciphey.readthedocs.io/en/latest/)
|
||||
### ⚗️ Importing Ciphey
|
||||
You can import Ciphey\'s main and use it in your own programs and code. `from Ciphey.__main__ import main`
|
||||
|
||||
# Contributors
|
||||
Ciphey was invented by [Brandon Skerritt](https://github.com/brandonskerritt) way back in 2008 (don't worry, the code has upgraded since then 😜) but it wouldn't be where it is today without [Cyclic3](https://github.com/Cyclic3).
|
||||
## Contributing
|
||||
Please read the [contributing file](https://github.com/Ciphey/Ciphey/blob/master/CONTRIBUTING.md) or submit an issue and we can help you.
|
||||
## Financial Contributors
|
||||
Please donate to us, we're students and we want Huel.
|
||||
# 🎪 Contributors
|
||||
Ciphey was invented by [Brandon Skerritt](https://github.com/brandonskerritt) in 2008, and revived in 2019. Ciphey wouldn't be where it was today without [Cyclic3](https://github.com/Cyclic3) - president of UoL's Cyber Security Society.
|
||||
|
||||
## Contributors ✨
|
||||
Ciphey was revived & recreated by the [Cyber Security Society](https://www.cybersoc.cf/) for use in CTFs. If you're ever in Liverpool, consider giving a talk or sponsoring our events. Email us at `cybersecurity@society.liverpoolguild.org` to find out more 🤠
|
||||
|
||||
**Major Credit** to George H for designing the searching algorithm among other things.
|
||||
**Special thanks** to [varghalladesign](https://www.facebook.com/varghalladesign) for designing the logo. Check out their other design work!
|
||||
|
||||
## 🐕🦺 [Contributing](CONTRIBUTING.md)
|
||||
Don't be afraid to contribute! We have many, many things you can do to help out. Each of them labelled and easily explained with examples. If you're trying to contribute but stuck, tag @brandonskerritt in the GitHub issue ✨
|
||||
|
||||
Alternatively, join the Discord group and send a message there (link in [contrib file](CONTRIBUTING.md)) or at the top of this README as a badge.
|
||||
|
||||
Please read the [contributing file](CONTRIBUTING.md) for exact details on how to contribute ✨
|
||||
## 💰 Financial Contributors
|
||||
The contributions will be used to fund not only the future of Ciphey and its authors, but also Cyber Security Society at the University of Liverpool.
|
||||
|
||||
GitHub doesn't support "sponsor this project and we'll evenly distribute the money", so pick a link and we'll sort it out on our end 🥰
|
||||
|
||||
## ✨ Contributors
|
||||
|
||||
Thanks goes to these wonderful people ([emoji key](https://allcontributors.org/docs/en/emoji-key)):
|
||||
|
||||
|
|
|
@ -0,0 +1,12 @@
|
|||
class Ciphey < Formula
|
||||
desc "Automated decryption tool"
|
||||
homepage ""
|
||||
url "https://github.com/Ciphey/Ciphey/archive/4.2.0.tar.gz"
|
||||
sha256 "013c438cc1f1c34c314bb202209acb36d9da142d4febeb21e5d4a06fa7b4dd7c"
|
||||
|
||||
|
||||
def install
|
||||
bin.install "ciphey"
|
||||
end
|
||||
|
||||
end
|
|
@ -30,7 +30,7 @@ class Ascii:
|
|||
"Extra Information": None,
|
||||
}
|
||||
|
||||
if self.lc.checkLanguage(result):
|
||||
if self.lc.check(result):
|
||||
logger.debug(f"English found in ASCII, returning {result}")
|
||||
return {
|
||||
"lc": self.lc,
|
||||
|
|
|
@ -1,156 +0,0 @@
|
|||
import base64
|
||||
import binascii
|
||||
from typing import Callable
|
||||
|
||||
from loguru import logger
|
||||
|
||||
import base58
|
||||
import base62
|
||||
|
||||
|
||||
class Bases:
|
||||
"""
|
||||
turns base64 strings into normal strings
|
||||
"""
|
||||
|
||||
def __init__(self, lc):
|
||||
self.lc = lc
|
||||
|
||||
def decrypt(self, text: str):
|
||||
logger.debug("Attempting base decoding")
|
||||
|
||||
bases = [
|
||||
self.base32(text),
|
||||
self.base16(text),
|
||||
self.base64(text),
|
||||
self.base85(text),
|
||||
self.ascii85(text),
|
||||
self.base58_bitcoin(text),
|
||||
self.base58_ripple(text, alphabet=base58.RIPPLE_ALPHABET),
|
||||
self.b62(text),
|
||||
]
|
||||
for answer in bases:
|
||||
try:
|
||||
if answer["IsPlaintext?"]:
|
||||
# good answer
|
||||
logger.debug(f"Returning true for {answer}")
|
||||
return answer
|
||||
except TypeError:
|
||||
continue
|
||||
# Base85
|
||||
# if nothing works, it has failed.
|
||||
return self.badRet()
|
||||
|
||||
def _dispatch(
|
||||
self, decoder: Callable[[str], bytes], text: str, cipher: str, alphabet=None
|
||||
):
|
||||
logger.trace("Attempting base64")
|
||||
result = None
|
||||
try:
|
||||
result = decoder(text) if not alphabet else decoder(text, alphabet)
|
||||
# yeet turning b strings into normal stringy bois
|
||||
result = result.decode("utf-8")
|
||||
except UnicodeDecodeError as e:
|
||||
logger.trace("Bad unicode")
|
||||
result = None
|
||||
except binascii.Error as e:
|
||||
logger.trace("binascii error")
|
||||
result = None
|
||||
except ValueError:
|
||||
logger.trace("Failed to decode base")
|
||||
result = None
|
||||
except:
|
||||
logger.trace("Failed to decode base")
|
||||
result = None
|
||||
|
||||
if result is not None and self.lc.checkLanguage(result):
|
||||
logger.debug(f"Bases successful, returning {result}")
|
||||
return self.goodRet(result, cipher=cipher)
|
||||
else:
|
||||
return self.badRet()
|
||||
|
||||
def base64(self, text: str):
|
||||
"""Base64 decode
|
||||
|
||||
args:
|
||||
text -> text to decode
|
||||
returns:
|
||||
the text decoded as base64
|
||||
"""
|
||||
logger.trace("Attempting base64")
|
||||
return self._dispatch(base64.b64decode, text, "base64")
|
||||
|
||||
def base32(self, text: str):
|
||||
"""Base32 decode
|
||||
|
||||
args:
|
||||
text -> text to decode
|
||||
returns:
|
||||
the text decoded as base32
|
||||
"""
|
||||
logger.trace("Attempting base32")
|
||||
return self._dispatch(base64.b32decode, text, "base32")
|
||||
|
||||
def base16(self, text: str):
|
||||
"""Base16 decode
|
||||
|
||||
args:
|
||||
text -> text to decode
|
||||
returns:
|
||||
the text decoded as base16
|
||||
"""
|
||||
logger.trace("Attempting base16")
|
||||
return self._dispatch(base64.b16decode, text, "base16")
|
||||
|
||||
def base85(self, text: str):
|
||||
"""Base85 decode
|
||||
|
||||
args:
|
||||
text -> text to decode
|
||||
returns:
|
||||
the text decoded as base85
|
||||
"""
|
||||
logger.trace("Attempting base85")
|
||||
return self._dispatch(base64.b85decode, text, "base85")
|
||||
|
||||
def ascii85(self, text: str):
|
||||
"""Base85 decode
|
||||
|
||||
args:
|
||||
text -> text to decode
|
||||
returns:
|
||||
the text decoded as base85
|
||||
"""
|
||||
logger.trace("Attempting ascii85")
|
||||
return self._dispatch(base64.a85decode, text, "base85")
|
||||
|
||||
def base58_bitcoin(self, text: str):
|
||||
logger.trace("Attempting Base58 Bitcoin")
|
||||
return self._dispatch(base58.b58decode, text, "base58_bitcoin")
|
||||
|
||||
def base58_ripple(self, text: str, alphabet: str):
|
||||
logger.trace("Attempting Base58 ripple alphabet")
|
||||
return self._dispatch(base58.b58decode, text, "base58_ripple", alphabet=alphabet)
|
||||
|
||||
def b62(self, text: str):
|
||||
logger.trace("Attempting base62")
|
||||
return self._dispatch(base62.decode, text, "base62")
|
||||
|
||||
def goodRet(self, result, cipher):
|
||||
logger.debug(f"Result for base is true, where result is {result}")
|
||||
return {
|
||||
"lc": self.lc,
|
||||
"IsPlaintext?": True,
|
||||
"Plaintext": result,
|
||||
"Cipher": cipher,
|
||||
"Extra Information": None,
|
||||
}
|
||||
|
||||
def badRet(self):
|
||||
return {
|
||||
"lc": self.lc,
|
||||
"IsPlaintext?": False,
|
||||
"Plaintext": None,
|
||||
"Cipher": None,
|
||||
"Extra Information": None,
|
||||
}
|
|
@ -1,62 +0,0 @@
|
|||
import binascii
|
||||
|
||||
from loguru import logger
|
||||
|
||||
|
||||
class Binary:
|
||||
def __init__(self, lc):
|
||||
self.lc = lc
|
||||
|
||||
def decrypt(self, text):
|
||||
logger.debug("Attempting to decrypt binary")
|
||||
try:
|
||||
result = self.decode(text)
|
||||
except ValueError as e:
|
||||
return {
|
||||
"lc": self.lc,
|
||||
"IsPlaintext?": False,
|
||||
"Plaintext": None,
|
||||
"Cipher": None,
|
||||
"Extra Information": None,
|
||||
}
|
||||
except TypeError as e:
|
||||
return {
|
||||
"lc": self.lc,
|
||||
"IsPlaintext?": False,
|
||||
"Plaintext": None,
|
||||
"Cipher": None,
|
||||
"Extra Information": None,
|
||||
}
|
||||
|
||||
if self.lc.checkLanguage(result):
|
||||
logger.debug(f"Answer found for binary")
|
||||
return {
|
||||
"lc": self.lc,
|
||||
"IsPlaintext?": True,
|
||||
"Plaintext": result,
|
||||
"Cipher": "Ascii to Binary encoded",
|
||||
"Extra Information": None,
|
||||
}
|
||||
return {
|
||||
"lc": self.lc,
|
||||
"IsPlaintext?": False,
|
||||
"Plaintext": None,
|
||||
"Cipher": None,
|
||||
"Extra Information": None,
|
||||
}
|
||||
|
||||
def decode(self, text):
|
||||
"""
|
||||
Decodes into binary using .encode()
|
||||
"""
|
||||
text = text.replace(" ", "")
|
||||
# to a bytes string
|
||||
text = text.encode("utf-8")
|
||||
|
||||
# into base 2
|
||||
n = int(text, 2)
|
||||
|
||||
# into ascii
|
||||
text = n.to_bytes((n.bit_length() + 7) // 8, "big").decode()
|
||||
|
||||
return text
|
|
@ -1,4 +1,3 @@
|
|||
from .bases import Bases
|
||||
from .binary import Binary
|
||||
from .hexadecimal import Hexadecimal
|
||||
from .ascii import Ascii
|
||||
|
@ -11,7 +10,6 @@ from loguru import logger
|
|||
class EncodingParent:
|
||||
def __init__(self, lc):
|
||||
self.lc = lc
|
||||
self.base64 = Bases(self.lc)
|
||||
self.binary = Binary(self.lc)
|
||||
self.hex = Hexadecimal(self.lc)
|
||||
self.ascii = Ascii(self.lc)
|
||||
|
@ -40,7 +38,7 @@ class EncodingParent:
|
|||
|
||||
for answer in answers:
|
||||
logger.debug(f"All answers are {answers}")
|
||||
# adds the LC objects together
|
||||
# adds the Checkers objects together
|
||||
# self.lc = self.lc + answer["lc"]
|
||||
if answer is not None and answer["IsPlaintext?"]:
|
||||
logger.debug(f"Plaintext found {answer}")
|
||||
|
|
|
@ -1,29 +0,0 @@
|
|||
from loguru import logger
|
||||
|
||||
|
||||
class Hexadecimal:
|
||||
def __init__(self, lc):
|
||||
self.lc = lc
|
||||
|
||||
def decrypt(self, text):
|
||||
logger.debug("Attempting hexadecimal decryption")
|
||||
try:
|
||||
result = bytearray.fromhex(text).decode()
|
||||
except ValueError as e:
|
||||
return {
|
||||
"lc": self.lc,
|
||||
"IsPlaintext?": False,
|
||||
"Plaintext": None,
|
||||
"Cipher": None,
|
||||
"Extra Information": None,
|
||||
}
|
||||
|
||||
if self.lc.checkLanguage(result):
|
||||
logger.debug(f"Hexadecimal successful, returning {result}")
|
||||
return {
|
||||
"lc": self.lc,
|
||||
"IsPlaintext?": True,
|
||||
"Plaintext": result,
|
||||
"Cipher": "Ascii to Hexadecimal encoded",
|
||||
"Extra Information": None,
|
||||
}
|
|
@ -0,0 +1,27 @@
|
|||
class letters:
|
||||
|
||||
"""Deals with Nato Strings / first letter of every word"""
|
||||
|
||||
def __init__(self):
|
||||
None
|
||||
|
||||
def __name__(self):
|
||||
return "Letters"
|
||||
|
||||
def decrypt(self, text: str) -> dict:
|
||||
return text
|
||||
|
||||
def first_letter_every_word(self, text):
|
||||
"""
|
||||
This should be supplied a string like "hello my name is"
|
||||
"""
|
||||
|
||||
text = text.split(".")
|
||||
|
||||
new_text = []
|
||||
for sentence in text:
|
||||
for word in sentence.split(" "):
|
||||
new_text.append(word[0])
|
||||
# Applies a space after every sentence
|
||||
# which might be every word
|
||||
new_text.append(" ")
|
|
@ -1,61 +0,0 @@
|
|||
from loguru import logger
|
||||
import cipheydists
|
||||
|
||||
|
||||
class MorseCode:
|
||||
def __init__(self, lc):
|
||||
self.lc = lc
|
||||
self.ALLOWED = {".", "-", " ", "/", "\n"}
|
||||
self.MORSE_CODE_DICT = dict(cipheydists.get_charset("morse"))
|
||||
self.MORSE_CODE_DICT_INV = {v: k for k, v in self.MORSE_CODE_DICT.items()}
|
||||
|
||||
def decrypt(self, text):
|
||||
logger.debug("Attempting morse code")
|
||||
if not self.checkIfMorse(text):
|
||||
return {
|
||||
"lc": self.lc,
|
||||
"IsPlaintext?": False,
|
||||
"Plaintext": None,
|
||||
"Cipher": "Morse Code",
|
||||
"Extra Information": None,
|
||||
}
|
||||
try:
|
||||
result = self.unmorse_it(text)
|
||||
except TypeError as e:
|
||||
return {
|
||||
"lc": self.lc,
|
||||
"IsPlaintext?": False,
|
||||
"Plaintext": None,
|
||||
"Cipher": "Morse Code",
|
||||
"Extra Information": None,
|
||||
}
|
||||
logger.debug(f"Morse code successful, returning {result}")
|
||||
return {
|
||||
"lc": self.lc,
|
||||
"IsPlaintext?": True,
|
||||
"Plaintext": result,
|
||||
"Cipher": "Morse Code",
|
||||
"Extra Information": None,
|
||||
}
|
||||
|
||||
def checkIfMorse(self, text):
|
||||
count = 0
|
||||
for i in text:
|
||||
if i in self.ALLOWED:
|
||||
count += 1
|
||||
return count / len(text) > 0.625
|
||||
|
||||
def unmorse_it(self, text):
|
||||
returnMsg = ""
|
||||
for word in text.split("/"):
|
||||
for char in word.strip().split():
|
||||
# translates every letter
|
||||
try:
|
||||
m = self.MORSE_CODE_DICT_INV[char]
|
||||
except KeyError:
|
||||
m = ""
|
||||
returnMsg = returnMsg + m
|
||||
# after every word add a space
|
||||
# after every word add a space
|
||||
returnMsg = returnMsg + " "
|
||||
return returnMsg.strip().upper()
|
|
@ -46,11 +46,11 @@ class Octal:
|
|||
}
|
||||
|
||||
def decode(self, text):
|
||||
'''
|
||||
"""
|
||||
It takes an octal string and return a string
|
||||
:octal_str: octal str like "110 145 154"
|
||||
'''
|
||||
|
||||
"""
|
||||
|
||||
str_converted = ""
|
||||
for octal_char in text.split(" "):
|
||||
str_converted += chr(int(octal_char, 8))
|
||||
|
|
|
@ -99,7 +99,7 @@ def crack(hashvalue, lc):
|
|||
if len(hashvalue) == 32:
|
||||
for api in md5:
|
||||
r = api(hashvalue, "md5")
|
||||
result = lc.checkLanguage(r) if r is not None else None
|
||||
result = lc.check(r) if r is not None else None
|
||||
if result is not None or r is not None:
|
||||
logger.debug(f"MD5 returns True {r}")
|
||||
return {
|
||||
|
@ -112,7 +112,7 @@ def crack(hashvalue, lc):
|
|||
elif len(hashvalue) == 40:
|
||||
for api in sha1:
|
||||
r = api(hashvalue, "sha1")
|
||||
result = lc.checkLanguage(r) if r is not None else None
|
||||
result = lc.check(r) if r is not None else None
|
||||
if result is not None and r is not None:
|
||||
logger.debug(f"sha1 returns true")
|
||||
return {
|
||||
|
@ -125,7 +125,7 @@ def crack(hashvalue, lc):
|
|||
elif len(hashvalue) == 64:
|
||||
for api in sha256:
|
||||
r = api(hashvalue, "sha256")
|
||||
result = lc.checkLanguage(r) if r is not None else None
|
||||
result = lc.check(r) if r is not None else None
|
||||
if result is not None and r is not None:
|
||||
logger.debug(f"sha256 returns true")
|
||||
return {
|
||||
|
@ -138,7 +138,7 @@ def crack(hashvalue, lc):
|
|||
elif len(hashvalue) == 96:
|
||||
for api in sha384:
|
||||
r = api(hashvalue, "sha384")
|
||||
result = lc.checkLanguage(r) if r is not None else None
|
||||
result = lc.check(r) if r is not None else None
|
||||
if result is not None and r is not None:
|
||||
logger.debug(f"sha384 returns true")
|
||||
return {
|
||||
|
@ -151,7 +151,7 @@ def crack(hashvalue, lc):
|
|||
elif len(hashvalue) == 128:
|
||||
for api in sha512:
|
||||
r = api(hashvalue, "sha512")
|
||||
result = lc.checkLanguage(r) if r is not None else None
|
||||
result = lc.check(r) if r is not None else None
|
||||
if result is not None and r is not None:
|
||||
logger.debug(f"sha512 returns true")
|
||||
return {
|
||||
|
|
|
@ -1,117 +0,0 @@
|
|||
"""
|
||||
██████╗██╗██████╗ ██╗ ██╗███████╗██╗ ██╗
|
||||
██╔════╝██║██╔══██╗██║ ██║██╔════╝╚██╗ ██╔╝
|
||||
██║ ██║██████╔╝███████║█████╗ ╚████╔╝
|
||||
██║ ██║██╔═══╝ ██╔══██║██╔══╝ ╚██╔╝
|
||||
╚██████╗██║██║ ██║ ██║███████╗ ██║
|
||||
© Brandon Skerritt
|
||||
https://github.com/brandonskerritt/ciphey
|
||||
"""
|
||||
try:
|
||||
import Decryptor.basicEncryption.caesar as ca
|
||||
import Decryptor.basicEncryption.reverse as re
|
||||
import Decryptor.basicEncryption.vigenere as vi
|
||||
import Decryptor.basicEncryption.pigLatin as pi
|
||||
import Decryptor.basicEncryption.transposition as tr
|
||||
except ModuleNotFoundError:
|
||||
import ciphey.Decryptor.basicEncryption.caesar as ca
|
||||
import ciphey.Decryptor.basicEncryption.reverse as re
|
||||
import ciphey.Decryptor.basicEncryption.vigenere as vi
|
||||
import ciphey.Decryptor.basicEncryption.pigLatin as pi
|
||||
import ciphey.Decryptor.basicEncryption.transposition as tr
|
||||
|
||||
"""
|
||||
So I want to assign the prob distribution to objects
|
||||
so it makes sense to do this?
|
||||
list of objects
|
||||
for each item in the prob distribution
|
||||
replace that with the appropriate object in the list?
|
||||
So each object has a getName func that returns the name as a str
|
||||
|
||||
new_prob_dict = {}
|
||||
for key, val in self.prob:
|
||||
for obj in list:
|
||||
if obj.getName() == key:
|
||||
new_prob_dict[obj] = val
|
||||
|
||||
But I don't need to do all this, do I?
|
||||
The dict comes in already sorted.
|
||||
So why do I need the probability values if it's sorted?
|
||||
It'd be easier if I make a list in the same order as the dict?
|
||||
sooo
|
||||
|
||||
list_objs = [caeser, etc]
|
||||
counter = 0
|
||||
for key, val in self.prob:
|
||||
for listCounter, item in enumerate(list_objs):
|
||||
if item.getName() == key:
|
||||
# moves the item
|
||||
list_objs.insert(counter, list_objs.pop(listCounter))
|
||||
counter = counter + 1
|
||||
|
||||
Eventually we get a sorted list of obj
|
||||
"""
|
||||
|
||||
|
||||
class BasicParent:
|
||||
def __init__(self, lc):
|
||||
self.lc = lc
|
||||
self.caesar = ca.Caesar(self.lc)
|
||||
self.reverse = re.Reverse(self.lc)
|
||||
self.vigenere = vi.Vigenere(self.lc)
|
||||
self.pig = pi.PigLatin(self.lc)
|
||||
self.trans = tr.Transposition(self.lc)
|
||||
|
||||
self.list_of_objects = [self.caesar, self.reverse, self.pig, self.trans]
|
||||
|
||||
def decrypt(self, text):
|
||||
self.text = text
|
||||
from multiprocessing.dummy import Pool as ThreadPool
|
||||
|
||||
pool = ThreadPool(16)
|
||||
answers = pool.map(self.callDecrypt, self.list_of_objects)
|
||||
|
||||
"""for item in self.list_of_objects:
|
||||
result = item.decrypt(text)
|
||||
answers.append(result)"""
|
||||
for answer in answers:
|
||||
# adds the LC objects together
|
||||
# self.lc = self.lc + answer["lc"]
|
||||
if answer["IsPlaintext?"]:
|
||||
return answer
|
||||
|
||||
# so vigenere runs ages
|
||||
# and you cant kill threads in a pool
|
||||
# so i just run it last lol]
|
||||
#
|
||||
# Not anymore! #basedcore
|
||||
|
||||
result = self.callDecrypt(self.vigenere)
|
||||
if result["IsPlaintext?"]:
|
||||
return result
|
||||
|
||||
return {
|
||||
"lc": self.lc,
|
||||
"IsPlaintext?": False,
|
||||
"Plaintext": None,
|
||||
"Cipher": None,
|
||||
"Extra Information": None,
|
||||
}
|
||||
|
||||
def callDecrypt(self, obj):
|
||||
# i only exist to call decrypt
|
||||
return obj.decrypt(self.text)
|
||||
|
||||
def setProbTable(self, prob):
|
||||
"""I'm still writing this"""
|
||||
self.probabilityDistribution = prob
|
||||
# we get a sorted list of objects :)
|
||||
counter = 0
|
||||
for key, val in self.probabilityDistribution.items():
|
||||
for listCounter, item in enumerate(self.list_of_objects):
|
||||
if item.getName() == key:
|
||||
# moves the item
|
||||
list_objs.insert(counter, list_objs.pop(listCounter))
|
||||
counter = counter + 1
|
||||
def __name__(self):
|
||||
return "basicParent"
|
|
@ -36,7 +36,7 @@ class Caesar:
|
|||
|
||||
for candidate in possible_keys:
|
||||
translated = cipheycore.caesar_decrypt(message, candidate.key, group)
|
||||
result = self.lc.checkLanguage(translated)
|
||||
result = self.lc.check(translated)
|
||||
if result:
|
||||
logger.debug(f"Caesar cipher returns true {result}")
|
||||
return {
|
||||
|
|
|
@ -49,7 +49,7 @@ class PigLatin:
|
|||
|
||||
# TODO find a way to return 2 variables
|
||||
# this returns 2 variables in a tuple
|
||||
if self.lc.checkLanguage(message3AY):
|
||||
if self.lc.check(message3AY):
|
||||
logger.debug("Pig latin 3AY returns True")
|
||||
return {
|
||||
"lc": self.lc,
|
||||
|
@ -58,7 +58,7 @@ class PigLatin:
|
|||
"Cipher": "Pig Latin",
|
||||
"Extra Information": None,
|
||||
}
|
||||
elif self.lc.checkLanguage(messagepigWAY):
|
||||
elif self.lc.check(messagepigWAY):
|
||||
logger.debug("Pig latin WAY returns True")
|
||||
return {
|
||||
"lc": self.lc,
|
||||
|
|
|
@ -1,42 +0,0 @@
|
|||
import sys
|
||||
|
||||
sys.path.append("..")
|
||||
try:
|
||||
import mathsHelper as mh
|
||||
except ModuleNotFoundError:
|
||||
import ciphey.mathsHelper as mh
|
||||
from loguru import logger
|
||||
|
||||
|
||||
class Reverse:
|
||||
def __init__(self, lc):
|
||||
self.lc = lc
|
||||
self.mh = mh.mathsHelper()
|
||||
|
||||
def decrypt(self, message):
|
||||
logger.debug("In reverse")
|
||||
message = self.mh.strip_puncuation(message)
|
||||
|
||||
message = message[::-1]
|
||||
result = self.lc.checkLanguage(message)
|
||||
if result:
|
||||
logger.debug("Reverse returns True")
|
||||
return {
|
||||
"lc": self.lc,
|
||||
"IsPlaintext?": True,
|
||||
"Plaintext": message,
|
||||
"Cipher": "Reverse",
|
||||
"Extra Information": None,
|
||||
}
|
||||
else:
|
||||
logger.debug(f"Reverse returns False")
|
||||
return {
|
||||
"lc": self.lc,
|
||||
"IsPlaintext?": False,
|
||||
"Plaintext": None,
|
||||
"Cipher": "Reverse",
|
||||
"Extra Information": None,
|
||||
}
|
||||
|
||||
def getName(self):
|
||||
return "Reverse"
|
|
@ -34,7 +34,7 @@ class Transposition:
|
|||
logger.debug(f"Transposition trying key {key}")
|
||||
decryptedText = self.decryptMessage(key, message)
|
||||
# if decrypted english is found, return them
|
||||
result = self.lc.checkLanguage(decryptedText)
|
||||
result = self.lc.check(decryptedText)
|
||||
if result:
|
||||
logger.debug("transposition returns true")
|
||||
return {
|
||||
|
|
|
@ -1,224 +0,0 @@
|
|||
import itertools, re
|
||||
import cipheycore
|
||||
import cipheydists
|
||||
|
||||
|
||||
class Vigenere:
|
||||
def __init__(self, lc):
|
||||
self.LETTERS = "abcdefghijklmnopqrstuvwxyz"
|
||||
self.SILENT_MODE = True # If set to True, program doesn't print anything.
|
||||
self.NUM_MOST_FREQ_LETTERS = 4 # Attempt this many letters per subkey.
|
||||
self.MAX_KEY_LENGTH = 16 # Will not attempt keys longer than this.
|
||||
self.NONLETTERS_PATTERN = re.compile("[^A-Z]")
|
||||
|
||||
self.lc = lc
|
||||
|
||||
def decrypt(self, text):
|
||||
result = self.hackVigenere(text)
|
||||
if result is None:
|
||||
return {
|
||||
"lc": self.lc,
|
||||
"IsPlaintext?": False,
|
||||
"Plaintext": None,
|
||||
"Cipher": "Viginere",
|
||||
"Extra Information": None,
|
||||
}
|
||||
|
||||
return result
|
||||
|
||||
def findRepeatSequencesSpacings(self, message):
|
||||
# Goes through the message and finds any 3 to 5 letter sequences
|
||||
# that are repeated. Returns a dict with the keys of the sequence and
|
||||
# values of a list of spacings (num of letters between the repeats).
|
||||
|
||||
# Use a regular expression to remove non-letters from the message:
|
||||
message = self.NONLETTERS_PATTERN.sub("", message.upper())
|
||||
|
||||
# Compile a list of seqLen-letter sequences found in the message:
|
||||
seqSpacings = {} # Keys are sequences, values are lists of int spacings.
|
||||
for seqLen in range(3, 6):
|
||||
for seqStart in range(len(message) - seqLen):
|
||||
# Determine what the sequence is, and store it in seq:
|
||||
seq = message[seqStart : seqStart + seqLen]
|
||||
|
||||
# Look for this sequence in the rest of the message:
|
||||
for i in range(seqStart + seqLen, len(message) - seqLen):
|
||||
if message[i : i + seqLen] == seq:
|
||||
# Found a repeated sequence.
|
||||
if seq not in seqSpacings:
|
||||
seqSpacings[seq] = [] # Initialize a blank list.
|
||||
|
||||
# Append the spacing distance between the repeated
|
||||
# sequence and the original sequence:
|
||||
seqSpacings[seq].append(i - seqStart)
|
||||
return seqSpacings
|
||||
|
||||
def getUsefulFactors(self, num):
|
||||
# Returns a list of useful factors of num. By "useful" we mean factors
|
||||
# less than MAX_KEY_LENGTH + 1 and not 1. For example,
|
||||
# getUsefulFactors(144) returns [2, 3, 4, 6, 8, 9, 12, 16]
|
||||
|
||||
if num < 2:
|
||||
return [] # Numbers less than 2 have no useful factors.
|
||||
|
||||
factors = [] # The list of factors found.
|
||||
|
||||
# When finding factors, you only need to check the integers up to
|
||||
# MAX_KEY_LENGTH.
|
||||
for i in range(2, self.MAX_KEY_LENGTH + 1): # Don't test 1: it's not useful.
|
||||
if num % i == 0:
|
||||
factors.append(i)
|
||||
otherFactor = int(num / i)
|
||||
if otherFactor < self.MAX_KEY_LENGTH + 1 and otherFactor != 1:
|
||||
factors.append(otherFactor)
|
||||
return list(set(factors)) # Remove duplicate factors.
|
||||
|
||||
def getItemAtIndexOne(self, items):
|
||||
return items[1]
|
||||
|
||||
def getMostCommonFactors(self, seqFactors):
|
||||
# First, get a count of how many times a factor occurs in seqFactors:
|
||||
factorCounts = {} # Key is a factor, value is how often it occurs.
|
||||
|
||||
# seqFactors keys are sequences, values are lists of factors of the
|
||||
# spacings. seqFactors has a value like: {'GFD': [2, 3, 4, 6, 9, 12,
|
||||
# 18, 23, 36, 46, 69, 92, 138, 207], 'ALW': [2, 3, 4, 6, ...], ...}
|
||||
for seq in seqFactors:
|
||||
factorList = seqFactors[seq]
|
||||
for factor in factorList:
|
||||
if factor not in factorCounts:
|
||||
factorCounts[factor] = 0
|
||||
factorCounts[factor] += 1
|
||||
|
||||
# Second, put the factor and its count into a tuple, and make a list
|
||||
# of these tuples so we can sort them:
|
||||
factorsByCount = []
|
||||
for factor in factorCounts:
|
||||
# Exclude factors larger than MAX_KEY_LENGTH:
|
||||
if factor <= self.MAX_KEY_LENGTH:
|
||||
# factorsByCount is a list of tuples: (factor, factorCount)
|
||||
# factorsByCount has a value like: [(3, 497), (2, 487), ...]
|
||||
factorsByCount.append((factor, factorCounts[factor]))
|
||||
|
||||
# Sort the list by the factor count:
|
||||
factorsByCount.sort(key=self.getItemAtIndexOne, reverse=True)
|
||||
|
||||
return factorsByCount
|
||||
|
||||
def kasiskiExamination(self, ciphertext):
|
||||
# Find out the sequences of 3 to 5 letters that occur multiple times
|
||||
# in the ciphertext. repeatedSeqSpacings has a value like:
|
||||
# {'EXG': [192], 'NAF': [339, 972, 633], ... }
|
||||
repeatedSeqSpacings = self.findRepeatSequencesSpacings(ciphertext)
|
||||
|
||||
# (See getMostCommonFactors() for a description of seqFactors.)
|
||||
seqFactors = {}
|
||||
for seq in repeatedSeqSpacings:
|
||||
seqFactors[seq] = []
|
||||
for spacing in repeatedSeqSpacings[seq]:
|
||||
seqFactors[seq].extend(self.getUsefulFactors(spacing))
|
||||
|
||||
# (See getMostCommonFactors() for a description of factorsByCount.)
|
||||
factorsByCount = self.getMostCommonFactors(seqFactors)
|
||||
|
||||
# Now we extract the factor counts from factorsByCount and
|
||||
# put them in allLikelyKeyLengths so that they are easier to
|
||||
# use later:
|
||||
allLikelyKeyLengths = []
|
||||
for twoIntTuple in factorsByCount:
|
||||
allLikelyKeyLengths.append(twoIntTuple[0])
|
||||
|
||||
return allLikelyKeyLengths
|
||||
|
||||
def attemptHackWithKeyLength(self, ciphertext, mostLikelyKeyLength):
|
||||
# Determine the most likely letters for each letter in the key:
|
||||
ciphertext = ciphertext.lower()
|
||||
|
||||
# Do core work
|
||||
group = cipheydists.get_charset("english")["lcase"]
|
||||
expected = cipheydists.get_dist("lcase")
|
||||
possible_keys = cipheycore.vigenere_crack(
|
||||
ciphertext, expected, group, mostLikelyKeyLength
|
||||
)
|
||||
n_keys = len(possible_keys)
|
||||
|
||||
# Try all the feasible keys
|
||||
for candidate in possible_keys:
|
||||
nice_key = list(candidate.key)
|
||||
# Create a possible key from the letters in allFreqScores:
|
||||
if not self.SILENT_MODE:
|
||||
print("Attempting with key: %s" % nice_key)
|
||||
|
||||
decryptedText = cipheycore.vigenere_decrypt(
|
||||
ciphertext, candidate.key, group
|
||||
)
|
||||
|
||||
if self.lc.checkLanguage(decryptedText):
|
||||
# Set the hacked ciphertext to the original casing:
|
||||
origCase = []
|
||||
for i in range(len(ciphertext)):
|
||||
if ciphertext[i].isupper():
|
||||
origCase.append(decryptedText[i].upper())
|
||||
else:
|
||||
origCase.append(decryptedText[i].lower())
|
||||
decryptedText = "".join(origCase)
|
||||
|
||||
# Check with user to see if the key has been found:
|
||||
return {
|
||||
"lc": self.lc,
|
||||
"IsPlaintext?": True,
|
||||
"Plaintext": decryptedText,
|
||||
"Cipher": "Viginere",
|
||||
"Extra Information": f"The key used is {nice_key}",
|
||||
}
|
||||
|
||||
# No English-looking decryption found, so return None:
|
||||
return None
|
||||
|
||||
def hackVigenere(self, ciphertext):
|
||||
# First, we need to do Kasiski Examination to figure out what the
|
||||
# length of the ciphertext's encryption key is:
|
||||
allLikelyKeyLengths = self.kasiskiExamination(ciphertext)
|
||||
if not self.SILENT_MODE:
|
||||
keyLengthStr = ""
|
||||
for keyLength in allLikelyKeyLengths:
|
||||
keyLengthStr += "%s " % (keyLength)
|
||||
print(
|
||||
"Kasiski Examination results say the most likely key lengths are: "
|
||||
+ keyLengthStr
|
||||
+ "\n"
|
||||
)
|
||||
hackedMessage = None
|
||||
for keyLength in allLikelyKeyLengths:
|
||||
if not self.SILENT_MODE:
|
||||
print(
|
||||
"Attempting hack with key length %s (%s possible keys)..."
|
||||
% (keyLength, self.NUM_MOST_FREQ_LETTERS ** keyLength)
|
||||
)
|
||||
hackedMessage = self.attemptHackWithKeyLength(ciphertext, keyLength)
|
||||
if hackedMessage != None:
|
||||
break
|
||||
|
||||
# If none of the key lengths we found using Kasiski Examination
|
||||
# worked, start brute-forcing through key lengths:
|
||||
if hackedMessage == None:
|
||||
if not self.SILENT_MODE:
|
||||
print(
|
||||
"Unable to hack message with likely key length(s). Brute forcing key length..."
|
||||
)
|
||||
for keyLength in range(1, self.MAX_KEY_LENGTH + 1):
|
||||
# Don't re-check key lengths already tried from Kasiski:
|
||||
if keyLength not in allLikelyKeyLengths:
|
||||
if not self.SILENT_MODE:
|
||||
print(
|
||||
"Attempting hack with key length %s (%s possible keys)..."
|
||||
% (keyLength, self.NUM_MOST_FREQ_LETTERS ** keyLength)
|
||||
)
|
||||
hackedMessage = self.attemptHackWithKeyLength(ciphertext, keyLength)
|
||||
if hackedMessage != None:
|
||||
break
|
||||
return hackedMessage
|
||||
|
||||
def getName(self):
|
||||
return "Viginere"
|
||||
|
|
@ -1,4 +0,0 @@
|
|||
|
||||
# Re-expose interface for lazy people
|
||||
from .iface import LanguageChecker
|
||||
|
|
@ -1,221 +0,0 @@
|
|||
"""
|
||||
██████╗██╗██████╗ ██╗ ██╗███████╗██╗ ██╗
|
||||
██╔════╝██║██╔══██╗██║ ██║██╔════╝╚██╗ ██╔╝
|
||||
██║ ██║██████╔╝███████║█████╗ ╚████╔╝
|
||||
██║ ██║██╔═══╝ ██╔══██║██╔══╝ ╚██╔╝
|
||||
╚██████╗██║██║ ██║ ██║███████╗ ██║
|
||||
© Brandon Skerritt
|
||||
Github: brandonskerritt
|
||||
|
||||
Class to determine whether somethine is English or not.
|
||||
1. Calculate the Chi Squared score of a sentence
|
||||
2. If the score is significantly lower than the average score, it _might_ be English
|
||||
2.1. If the score _might_ be English, then take the text and compare it to the sorted dictionary
|
||||
in O(n log n) time.
|
||||
It creates a percentage of "How much of this text is in the dictionary?"
|
||||
The dictionary contains:
|
||||
* 20,000 most common US words
|
||||
* 10,000 most common UK words (there's no repition between the two)
|
||||
* The top 10,000 passwords
|
||||
If the word "Looks like" English (chi-squared) and if it contains English words, we can conclude it is
|
||||
very likely English. The alternative is doing the dictionary thing but with an entire 479k word dictionary (slower)
|
||||
2.2. If the score is not English, but we haven't tested enough to create an average, then test it against
|
||||
the dictionary
|
||||
|
||||
Things to optimise:
|
||||
* We only run the dictionary if it's 20% smaller than the average for chi squared
|
||||
* We consider it "English" if 45% of the text matches the dictionary
|
||||
* We run the dictionary if there is less than 10 total chisquared test
|
||||
|
||||
How to add a language:
|
||||
* Download your desired dictionary. Try to make it the most popular words, for example. Place this file into this
|
||||
folder with languagename.txt
|
||||
As an example, this comes built in with english.txt
|
||||
Find the statistical frequency of each letter in that language.
|
||||
For English, we have:
|
||||
self.languages = {
|
||||
"English":
|
||||
[0.0855, 0.0160, 0.0316, 0.0387, 0.1210,0.0218, 0.0209, 0.0496, 0.0733, 0.0022,0.0081, 0.0421, 0.0253, 0.0717,
|
||||
0.0747,0.0207, 0.0010, 0.0633, 0.0673, 0.0894,0.0268, 0.0106, 0.0183, 0.0019, 0.0172,0.0011]
|
||||
}
|
||||
In chisquared.py
|
||||
To add your language, do:
|
||||
self.languages = {
|
||||
"English":
|
||||
[0.0855, 0.0160, 0.0316, 0.0387, 0.1210,0.0218, 0.0209, 0.0496, 0.0733, 0.0022,0.0081, 0.0421, 0.0253, 0.0717,
|
||||
0.0747,0.0207, 0.0010, 0.0633, 0.0673, 0.0894,0.0268, 0.0106, 0.0183, 0.0019, 0.0172,0.0011]
|
||||
"German": [0.0973]
|
||||
}
|
||||
In alphabetical order
|
||||
And you're.... Done! Make sure the name of the two match up
|
||||
"""
|
||||
from typing import Dict, Set
|
||||
|
||||
from .iface import LanguageChecker
|
||||
from string import punctuation
|
||||
|
||||
from loguru import logger
|
||||
|
||||
import string
|
||||
import os
|
||||
import sys
|
||||
from loguru import logger
|
||||
from .chisquared import chiSquared
|
||||
|
||||
import cipheydists
|
||||
|
||||
sys.path.append("..")
|
||||
try:
|
||||
import mathsHelper as mh
|
||||
except ModuleNotFoundError:
|
||||
import ciphey.mathsHelper as mh
|
||||
|
||||
|
||||
class Brandon(LanguageChecker):
|
||||
"""
|
||||
Class designed to confirm whether something is **language** based on how many words of **language** appears
|
||||
Call confirmLanguage(text, language)
|
||||
* text: the text you want to confirm
|
||||
* language: the language you want to confirm
|
||||
|
||||
Find out what language it is by using chisquared.py, the highest chisquared score is the language
|
||||
languageThreshold = 45
|
||||
if a string is 45% **language** words, then it's confirmed to be english
|
||||
"""
|
||||
|
||||
wordlist: set
|
||||
|
||||
def cleanText(self, text: str) -> set:
|
||||
"""Cleans the text ready to be checked
|
||||
|
||||
Strips punctuation, makes it lower case, turns it into a set separated by spaces, removes duplicate words
|
||||
|
||||
Args:
|
||||
text -> The text we use to perform analysis on
|
||||
|
||||
Returns:
|
||||
text -> the text as a list, now cleaned
|
||||
|
||||
"""
|
||||
# makes the text unique words and readable
|
||||
text = text.lower()
|
||||
text = self.mh.strip_puncuation(text)
|
||||
text = text.split(" ")
|
||||
text = set(text)
|
||||
return text
|
||||
|
||||
def checkWordlist(self, text: Set[str]) -> float:
|
||||
"""Sorts & searches the dict, and returns the proportion of the words that are in the dictionary
|
||||
|
||||
Args:
|
||||
text -> The text we use to perform analysis on
|
||||
language -> the language we want to check
|
||||
|
||||
Returns:
|
||||
counter -> how many words in text, are in the dict of language
|
||||
|
||||
"""
|
||||
# reads through most common words / passwords
|
||||
# and calculates how much of that is in language
|
||||
return len(text.intersection(self.wordlist)) / len(text)
|
||||
|
||||
def check1000Words(self, text: Set[str]) -> bool:
|
||||
"""Checks to see if word is in the list of 1000 words
|
||||
|
||||
the 1000words is a dict, so lookup is O(1)
|
||||
|
||||
Args:
|
||||
text -> The text we use to text (a word)
|
||||
|
||||
Returns:
|
||||
bool -> whether it's in the dict or not.
|
||||
|
||||
"""
|
||||
# If we have no wordlist, then we can't reject the candidate on this basis
|
||||
if self.top1000Words is None:
|
||||
return True
|
||||
|
||||
if text is None:
|
||||
return False
|
||||
# If any of the top 1000 words in the text appear
|
||||
# return true
|
||||
for word in text:
|
||||
# I was debating using any() here, but I think they're the
|
||||
# same speed so it doesn't really matter too much
|
||||
if word in self.top1000Words:
|
||||
return True
|
||||
return False
|
||||
|
||||
def confirmLanguage(self, text: set) -> True:
|
||||
"""Confirms whether given text is language
|
||||
|
||||
If the proportion (taken from checkDictionary) is higher than the language threshold, return True
|
||||
|
||||
Args:
|
||||
text -> The text we use to text (a word)
|
||||
language -> the language we use to check
|
||||
|
||||
Returns:
|
||||
bool -> whether it's written in Language or not
|
||||
|
||||
"""
|
||||
|
||||
proportion = self.checkWordlist(text)
|
||||
if self.checkWordlist(text) >= self.languageThreshold:
|
||||
logger.trace(f"The language proportion {proportion} is over the threshold {self.languageThreshold}")
|
||||
return True
|
||||
else:
|
||||
logger.trace(f"The language proportion {proportion} is under the threshold {self.languageThreshold}")
|
||||
return False
|
||||
|
||||
def __init__(self, config: dict):
|
||||
# Suppresses warning
|
||||
super().__init__(config)
|
||||
self.mh = mh.mathsHelper()
|
||||
self.languageThreshold = config["params"].get("threshold", 0.55)
|
||||
self.top1000Words = config["params"].get("top1000")
|
||||
self.wordlist = config["wordlist"]
|
||||
|
||||
def checkLanguage(self, text: str) -> bool:
|
||||
"""Checks to see if the text is in English
|
||||
|
||||
Performs a decryption, but mainly parses the internal data packet and prints useful information.
|
||||
|
||||
Args:
|
||||
text -> The text we use to perform analysis on
|
||||
|
||||
Returns:
|
||||
bool -> True if the text is English, False otherwise.
|
||||
|
||||
"""
|
||||
logger.trace(f"In Language Checker with \"{text}\"")
|
||||
text = self.cleanText(text)
|
||||
logger.trace(f"Text split to \"{text}\"")
|
||||
if text == "":
|
||||
return False
|
||||
if not self.check1000Words(text):
|
||||
logger.debug(
|
||||
f"1000 words failed. This is not plaintext"
|
||||
)
|
||||
return False
|
||||
|
||||
logger.trace(
|
||||
f"1000words check passed"
|
||||
)
|
||||
if not self.confirmLanguage(text):
|
||||
logger.debug(f"Dictionary check failed. This is not plaintext")
|
||||
return False
|
||||
|
||||
logger.trace(f"Dictionary check passed. This is plaintext")
|
||||
return True
|
||||
|
||||
@staticmethod
|
||||
def getArgs() -> Dict[str, object]:
|
||||
return {
|
||||
"top1000": {"desc": "A json dictionary of the top 1000 words", "req": False},
|
||||
"threshold": {"desc": "The minimum proportion (between 0 and 1) that must be in the dictionary", "req": False}
|
||||
}
|
||||
|
||||
|
||||
# Define alias
|
||||
ciphey_language_checker = Brandon
|
|
@ -1,81 +0,0 @@
|
|||
"""
|
||||
██████╗██╗██████╗ ██╗ ██╗███████╗██╗ ██╗
|
||||
██╔════╝██║██╔══██╗██║ ██║██╔════╝╚██╗ ██╔╝
|
||||
██║ ██║██████╔╝███████║█████╗ ╚████╔╝
|
||||
██║ ██║██╔═══╝ ██╔══██║██╔══╝ ╚██╔╝
|
||||
╚██████╗██║██║ ██║ ██║███████╗ ██║
|
||||
© Brandon Skerritt
|
||||
Github: brandonskerritt
|
||||
|
||||
Class calculates the Chi squared score
|
||||
"""
|
||||
from string import punctuation
|
||||
from numpy import std
|
||||
import sys
|
||||
|
||||
sys.path.append("..")
|
||||
try:
|
||||
import mathsHelper as mh
|
||||
except ModuleNotFoundError:
|
||||
import ciphey.mathsHelper as mh
|
||||
from loguru import logger
|
||||
import cipheycore
|
||||
import cipheydists
|
||||
|
||||
# I had a bug where empty string was being added to letter freq dictionary
|
||||
# this solves it :)
|
||||
punctuation += " "
|
||||
NUMBERS = "1234567890"
|
||||
|
||||
|
||||
class chiSquared:
|
||||
"""Class that calculates the Chi squared score and tries to work out what language it might be
|
||||
to add a new language, go into this class (/app/languageChecker/chisquared.py)
|
||||
Find "self.languages" and add it to the dictionary like "German":[0.789, 0.651...]
|
||||
The list is the letter frequency ordered in alphabetical order """
|
||||
|
||||
def __init__(self):
|
||||
self.language = cipheydists.get_dist("twist")
|
||||
self.average = 0.0
|
||||
self.totalDone = 0.0
|
||||
self.oldAverage = 0.0
|
||||
self.mh = mh.mathsHelper()
|
||||
self.highestLanguage = ""
|
||||
self.totalChi = 0.0
|
||||
self.totalEqual = False
|
||||
self.chisAsaList = []
|
||||
|
||||
# these are settings that may impact how the program works overall
|
||||
self.chiSquaredSignificanceThreshold = 0.001 # The p value that we reject below
|
||||
|
||||
def checkChi(self, text):
|
||||
if text is None:
|
||||
return False
|
||||
if type(text) is bytes:
|
||||
try:
|
||||
text = text.decode()
|
||||
except:
|
||||
return None
|
||||
"""Checks to see if the Chi score is good
|
||||
if it is, it returns True
|
||||
Call this when you want to determine whether something is likely to be Chi or not
|
||||
|
||||
Arguments:
|
||||
* text - the text you want to run a Chi Squared score on
|
||||
|
||||
Outputs:
|
||||
* True - if it has a significantly lower chi squared score
|
||||
* False - if it doesn't have a significantly lower chi squared score
|
||||
"""
|
||||
# runs after every chi squared to see if it's 1 significantly lower than averae
|
||||
# the or statement is bc if the program has just started I don't want it to ignore the
|
||||
# ones at the start
|
||||
analysis = cipheycore.analyse_string(text)
|
||||
chisq = cipheycore.chisq_test(analysis, self.language)
|
||||
logger.debug(f"Chi-squared p-value is {chisq}")
|
||||
return chisq > self.chiSquaredSignificanceThreshold
|
||||
|
||||
def getMostLikelyLanguage(self):
|
||||
"""Returns what the most likely language is
|
||||
Only used when the threshold of checkChi is reached"""
|
||||
return self.highestLanguage
|
|
@ -1,18 +0,0 @@
|
|||
from abc import ABC, abstractmethod
|
||||
from typing import Dict
|
||||
|
||||
|
||||
class LanguageChecker(ABC):
|
||||
@staticmethod
|
||||
@abstractmethod
|
||||
def getArgs(**kwargs) -> Dict[str, object]:
|
||||
"""The returned dictionary must be of the format:
|
||||
{<name:string>: {"req": <required:bool>, "desc": <description:string>}, ...}
|
||||
"""
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def checkLanguage(self, text: str) -> bool: pass
|
||||
|
||||
@abstractmethod
|
||||
def __init__(self, config: Dict[str, object]): pass
|
|
@ -1 +1,7 @@
|
|||
import __main__
|
||||
from . import common
|
||||
|
||||
from . import iface
|
||||
|
||||
from . import basemods
|
||||
|
||||
from . import __main__
|
||||
|
|
|
@ -4,313 +4,147 @@
|
|||
██║ ██║██████╔╝███████║█████╗ ╚████╔╝
|
||||
██║ ██║██╔═══╝ ██╔══██║██╔══╝ ╚██╔╝
|
||||
╚██████╗██║██║ ██║ ██║███████╗ ██║
|
||||
© Brandon Skerritt
|
||||
https://github.com/brandonskerritt/ciphey
|
||||
https://github.com/ciphey
|
||||
https://docs.ciphey.online
|
||||
|
||||
The cycle goes:
|
||||
main -> argparsing (if needed) -> call_encryption -> new Ciphey object -> decrypt() -> produceProbTable ->
|
||||
one_level_of_decryption -> decrypt_normal
|
||||
|
||||
Ciphey can be called 3 ways:
|
||||
echo 'text' | ciphey
|
||||
ciphey 'text'
|
||||
ciphey -t 'text'
|
||||
main captures the first 2
|
||||
argparsing captures the last one (-t)
|
||||
it sends this to call_encryption, which can handle all 3 arguments using dict unpacking
|
||||
|
||||
decrypt() creates the prob table and prints it.
|
||||
|
||||
one_level_of_decryption() allows us to repeatedly call one_level_of_decryption on the inputs
|
||||
so if something is doubly encrypted, we can use this to find it.
|
||||
|
||||
Decrypt_normal is one round of decryption. We need one_level_of_decryption to call it, as
|
||||
one_level_of_decryption handles progress bars and stuff.
|
||||
"""
|
||||
import os
|
||||
import warnings
|
||||
import argparse
|
||||
import sys
|
||||
from typing import Optional, Tuple, Dict
|
||||
from typing import Optional, Dict, Any, List
|
||||
import bisect
|
||||
|
||||
from ciphey.iface import SearchLevel
|
||||
from . import iface
|
||||
|
||||
from rich.console import Console
|
||||
from rich.table import Column, Table
|
||||
from rich.table import Table
|
||||
from loguru import logger
|
||||
import click
|
||||
import click_spinner
|
||||
|
||||
warnings.filterwarnings("ignore")
|
||||
|
||||
# Depending on whether Ciphey is called, or Ciphey/__main__
|
||||
# we need different imports to deal with both cases
|
||||
try:
|
||||
from ciphey.LanguageChecker import LanguageChecker as lc
|
||||
from ciphey.neuralNetworkMod.nn import NeuralNetwork
|
||||
from ciphey.Decryptor.basicEncryption.basic_parent import BasicParent
|
||||
from ciphey.Decryptor.Hash.hashParent import HashParent
|
||||
from ciphey.Decryptor.Encoding.encodingParent import EncodingParent
|
||||
import ciphey.mathsHelper as mh
|
||||
except ModuleNotFoundError:
|
||||
from LanguageChecker import LanguageChecker as lc
|
||||
from neuralNetworkMod.nn import NeuralNetwork
|
||||
from Decryptor.basicEncryption.basic_parent import BasicParent
|
||||
from Decryptor.Hash.hashParent import HashParent
|
||||
from Decryptor.Encoding.encodingParent import EncodingParent
|
||||
import mathsHelper as mh
|
||||
|
||||
def decrypt(config: iface.Config, ctext: Any) -> List[SearchLevel]:
|
||||
"""A simple alias for searching a ctext and makes the answer pretty"""
|
||||
res: iface.SearchResult = config.objs["searcher"].search(ctext)
|
||||
if config.verbosity < 0:
|
||||
return res.path[-1].result.value
|
||||
else:
|
||||
return iface.pretty_search_results(res)
|
||||
|
||||
|
||||
def make_default_config(ctext: str, trace: bool = False) -> Dict[str, object]:
|
||||
from ciphey.LanguageChecker.brandon import ciphey_language_checker as brandon
|
||||
import cipheydists
|
||||
|
||||
return {
|
||||
"ctext": ctext,
|
||||
"grep": False,
|
||||
"info": False,
|
||||
"debug": "TRACE" if trace else "WARNING",
|
||||
"checker": brandon,
|
||||
"wordlist": set(cipheydists.get_list("english")),
|
||||
"params": {},
|
||||
}
|
||||
|
||||
|
||||
class Ciphey:
|
||||
config = dict()
|
||||
params = dict()
|
||||
|
||||
def __init__(self, config):
|
||||
logger.remove()
|
||||
logger.configure()
|
||||
logger.add(sink=sys.stderr, level=config["debug"], colorize=sys.stderr.isatty())
|
||||
logger.opt(colors=True)
|
||||
logger.debug(f"""Debug level set to {config["debug"]}""")
|
||||
# general purpose modules
|
||||
self.ai = NeuralNetwork()
|
||||
self.lc = config["checker"](config)
|
||||
self.mh = mh.mathsHelper()
|
||||
# the one bit of text given to us to decrypt
|
||||
self.text: str = config["ctext"]
|
||||
self.basic = BasicParent(self.lc)
|
||||
self.hash = HashParent(self.lc)
|
||||
self.encoding = EncodingParent(self.lc)
|
||||
self.level: int = 1
|
||||
|
||||
self.config = config
|
||||
|
||||
self.console = Console()
|
||||
self.probability_distribution: dict = {}
|
||||
self.what_to_choose: dict = {}
|
||||
|
||||
def decrypt(self) -> Optional[Dict]:
|
||||
"""Performs the decryption of text
|
||||
|
||||
Creates the probability table, calls one_level_of_decryption
|
||||
|
||||
Args:
|
||||
None, it uses class variables.
|
||||
|
||||
Returns:
|
||||
None
|
||||
"""
|
||||
# Read the documentation for more on this function.
|
||||
# checks to see if inputted text is plaintext
|
||||
result = self.lc.checkLanguage(self.text)
|
||||
if result:
|
||||
print("You inputted plain text!")
|
||||
return {
|
||||
"lc": self.lc,
|
||||
"IsPlaintext?": True,
|
||||
"Plaintext": self.text,
|
||||
"Cipher": None,
|
||||
"Extra Information": None,
|
||||
}
|
||||
self.probability_distribution: dict = self.ai.predictnn(self.text)[0]
|
||||
self.what_to_choose: dict = {
|
||||
self.hash: {
|
||||
"sha1": self.probability_distribution[0],
|
||||
"md5": self.probability_distribution[1],
|
||||
"sha256": self.probability_distribution[2],
|
||||
"sha512": self.probability_distribution[3],
|
||||
},
|
||||
self.basic: {"caesar": self.probability_distribution[4]},
|
||||
"plaintext": {"plaintext": self.probability_distribution[5]},
|
||||
self.encoding: {
|
||||
"reverse": self.probability_distribution[6],
|
||||
"base64": self.probability_distribution[7],
|
||||
"binary": self.probability_distribution[8],
|
||||
"hexadecimal": self.probability_distribution[9],
|
||||
"ascii": self.probability_distribution[10],
|
||||
"morse": self.probability_distribution[11],
|
||||
},
|
||||
}
|
||||
|
||||
logger.trace(
|
||||
f"The probability table before 0.1 in __main__ is {self.what_to_choose}"
|
||||
)
|
||||
|
||||
# sorts each individual sub-dictionary
|
||||
for key, value in self.what_to_choose.items():
|
||||
for k, v in value.items():
|
||||
# Sets all 0 probabilities to 0.01, we want Ciphey to try all decryptions.
|
||||
if v < 0.01:
|
||||
# this should turn off hashing functions if offline mode is turned on
|
||||
self.what_to_choose[key][k] = 0.01
|
||||
logger.trace(
|
||||
f"The probability table after 0.1 in __main__ is {self.what_to_choose}"
|
||||
)
|
||||
|
||||
self.what_to_choose: dict = self.mh.sort_prob_table(self.what_to_choose)
|
||||
|
||||
# Creates and prints the probability table
|
||||
if not self.config["grep"]:
|
||||
self.produceprobtable(self.what_to_choose)
|
||||
|
||||
logger.debug(
|
||||
f"The new probability table after sorting in __main__ is {self.what_to_choose}"
|
||||
)
|
||||
|
||||
"""
|
||||
#for each dictionary in the dictionary
|
||||
# sort that dictionary
|
||||
#sort the overall dictionary by the first value of the new dictionary
|
||||
"""
|
||||
output = None
|
||||
if self.level <= 1:
|
||||
output = self.one_level_of_decryption()
|
||||
else:
|
||||
# TODO: make tmpfile
|
||||
f = open("decryptionContents.txt", "w")
|
||||
output = self.one_level_of_decryption(file=f)
|
||||
|
||||
for i in range(0, self.level):
|
||||
# open file and go through each text item
|
||||
pass
|
||||
logger.debug(f"decrypt is outputting {output}")
|
||||
return output
|
||||
|
||||
def produceprobtable(self, prob_table) -> None:
|
||||
"""Produces the probability table using Rich's API
|
||||
|
||||
Uses Rich's API to print the probability table.
|
||||
|
||||
Args:
|
||||
prob_table -> the probability table generated by the neural network
|
||||
|
||||
Returns:
|
||||
None, but prints the probability table.
|
||||
|
||||
"""
|
||||
logger.debug(f"Producing log table")
|
||||
table = Table(show_header=True, header_style="bold magenta")
|
||||
table.add_column("Name of Cipher")
|
||||
table.add_column("Probability", justify="right")
|
||||
# for every key, value in dict add a row
|
||||
# I think key is self.caesarcipher and not "caesar cipher"
|
||||
# i must callName() somewhere else in this code
|
||||
sorted_dic: dict = {}
|
||||
for k, v in prob_table.items():
|
||||
for key, value in v.items():
|
||||
# Prevents the table from showing pointless 0.01 probs as they're faked
|
||||
if value <= 0.01:
|
||||
continue
|
||||
# gets the string ready to print
|
||||
logger.debug(f"Key is {str(key)} and value is {str(value)}")
|
||||
val: int = round(self.mh.percentage(value, 1), 2)
|
||||
key_str: str = str(key).capitalize()
|
||||
# converts "Bases" to "Base"
|
||||
if "Base" in key_str:
|
||||
key_str = key_str[0:-2]
|
||||
sorted_dic[key_str] = val
|
||||
logger.debug(f"The value as percentage is {val} and key is {key_str}")
|
||||
sorted_dic: dict = {
|
||||
k: v
|
||||
for k, v in sorted(
|
||||
sorted_dic.items(), key=lambda item: item[1], reverse=True
|
||||
)
|
||||
}
|
||||
for k, v in sorted_dic.items():
|
||||
table.add_row(k, str(v) + "%")
|
||||
|
||||
self.console.print(table)
|
||||
return None
|
||||
|
||||
def one_level_of_decryption(self) -> Optional[dict]:
|
||||
"""Performs one level of encryption.
|
||||
|
||||
Either uses alive_bar or not depending on if self.greppable is set.
|
||||
|
||||
Returns:
|
||||
None.
|
||||
|
||||
"""
|
||||
# Calls one level of decryption
|
||||
# mainly used to control the progress bar
|
||||
output = None
|
||||
if self.config["grep"]:
|
||||
logger.debug("__main__ is running as greppable")
|
||||
output = self.decrypt_normal()
|
||||
else:
|
||||
logger.debug("__main__ is running with progress bar")
|
||||
output = self.decrypt_normal()
|
||||
return output
|
||||
|
||||
def decrypt_normal(self, bar=None) -> Optional[dict]:
|
||||
"""Called by one_level_of_decryption
|
||||
|
||||
Performs a decryption, but mainly parses the internal data packet and prints useful information.
|
||||
|
||||
Args:
|
||||
bar -> whether or not to use alive_Bar
|
||||
|
||||
Returns:
|
||||
str if found, or None if not
|
||||
|
||||
"""
|
||||
# This is redundant
|
||||
# result = self.lc.checkLanguage(self.text)
|
||||
# if result:
|
||||
# print("You inputted plain text!")
|
||||
# print(f"Returning {self.text}")
|
||||
# return self.text
|
||||
|
||||
logger.debug(f"In decrypt_normal")
|
||||
for key, val in self.what_to_choose.items():
|
||||
# https://stackoverflow.com/questions/4843173/how-to-check-if-type-of-a-variable-is-string
|
||||
if not isinstance(key, str):
|
||||
key.setProbTable(val)
|
||||
ret: dict = key.decrypt(self.text)
|
||||
logger.debug(f"Decrypt normal in __main__ ret is {ret}")
|
||||
logger.debug(
|
||||
f"The plaintext is {ret['Plaintext']} and the extra information is {ret['Cipher']} and {ret['Extra Information']}"
|
||||
)
|
||||
|
||||
if ret["IsPlaintext?"]:
|
||||
logger.debug(f"Ret is plaintext")
|
||||
print(ret["Plaintext"])
|
||||
if self.config["info"]:
|
||||
logger.trace("Self.cipher_info runs")
|
||||
if ret["Extra Information"] is not None:
|
||||
print(
|
||||
"The cipher used is",
|
||||
ret["Cipher"] + ".",
|
||||
ret["Extra Information"] + ".",
|
||||
)
|
||||
else:
|
||||
print("The cipher used is " + ret["Cipher"] + ".")
|
||||
return ret
|
||||
|
||||
logger.debug("No encryption found")
|
||||
print(
|
||||
"""No encryption found. Here are some tips to help crack the cipher:
|
||||
* Use the probability table to work out what it could be. Base = base16, base32, base64 etc.
|
||||
* If the probability table says 'Caesar Cipher' then it is a normal encryption that \
|
||||
Ciphey cannot decrypt yet.
|
||||
* If Ciphey think's it's a hash, try using hash-identifier to find out what hash it is, \
|
||||
and then HashCat to crack the hash.
|
||||
* The encryption may not contain normal English plaintext. It could be coordinates or \
|
||||
another object no found in the dictionary. Use 'ciphey -d true > log.txt' to generate a log \
|
||||
file of all attempted decryptions and manually search it."""
|
||||
)
|
||||
return None
|
||||
|
||||
# def arg_parsing(config: iface.Config) -> Optional[Dict[str, Any]]:
|
||||
# """This function parses arguments.
|
||||
#
|
||||
# Args:
|
||||
# config: The configuration object
|
||||
# Returns:
|
||||
# The config to be passed around for the rest of time
|
||||
# """
|
||||
#
|
||||
# # parser.add_argument(
|
||||
# # "--default-wordlist",
|
||||
# # help="Sets the default wordlist",
|
||||
# # action="store",
|
||||
# # default=None
|
||||
# # )
|
||||
#
|
||||
# args = config
|
||||
#
|
||||
# # First, we should work out how verbose we should be
|
||||
#
|
||||
# # Now we have set the log level, we can start debugging
|
||||
# logger.trace(f"Got arguments {args}")
|
||||
#
|
||||
# # the below text does:
|
||||
# # * if -t is supplied, use that
|
||||
# # * if ciphey is called like:
|
||||
# # * REMOVED: ciphey 'encrypted text' use that
|
||||
# # else if data is piped like:
|
||||
# # echo 'hello' | ciphey use that
|
||||
# # if no data is supplied, no arguments supplied.
|
||||
# text = None
|
||||
# if args["text"] is not None:
|
||||
# text = args["text"]
|
||||
# else:
|
||||
# print("No input given.")
|
||||
# exit(1)
|
||||
#
|
||||
# if len(sys.argv) == 1:
|
||||
# print("No arguments were supplied. Look at the help menu with -h or --help")
|
||||
# return None
|
||||
#
|
||||
# args["text"] = text
|
||||
# if len(args["text"]) < 3:
|
||||
# print("A string of less than 3 chars cannot be interpreted by Ciphey.")
|
||||
# return None
|
||||
#
|
||||
# # Now we can walk through the arguments, expanding them into the config struct
|
||||
# config["checker"] = args.get("checker")
|
||||
# config["info"] = args.get("info")
|
||||
# config["in"] = args.get("bytes_input")
|
||||
# config["out"] = args.get("bytes_output")
|
||||
# config["default_dist"] = args.get("default_dist")
|
||||
#
|
||||
# # Append the module lists:
|
||||
# if not "modules" in config:
|
||||
# config["modules"] = args["module"]
|
||||
# else:
|
||||
# config["modules"] += args["module"]
|
||||
# print(f"Config modules is {config['modules']}")
|
||||
# config.load_modules()
|
||||
# # Now we can walk through the arguments, expanding them into a canonical form
|
||||
# #
|
||||
# # First, we go over simple args
|
||||
# config["info"] = False
|
||||
# config["ctext"] = args["text"]
|
||||
# config["grep"] = args["greppable"]
|
||||
# config["offline"] = args["offline"]
|
||||
#
|
||||
# # Verbosity levels
|
||||
# if args["verbose"] >= 3:
|
||||
# config["debug"] = "TRACE"
|
||||
# config.update_log_level("TRACE")
|
||||
# elif args["verbose"] == 2:
|
||||
# config["debug"] = "DEBUG"
|
||||
# config.update_log_level("DEBUG")
|
||||
# elif args["verbose"] == 1:
|
||||
# config["debug"] = "ERROR"
|
||||
# config.update_log_level("ERROR")
|
||||
# else:
|
||||
# config["debug"] = "WARNING"
|
||||
#
|
||||
# if args["silent"]:
|
||||
# config.update_log_level(None)
|
||||
# config.grep = True
|
||||
#
|
||||
# # Try to locate language checker module
|
||||
# # TODO: actually implement this
|
||||
# from ciphey.LanguageChecker.brandon import ciphey_language_checker as brandon
|
||||
#
|
||||
# config["checker"] = brandon
|
||||
# # Try to locate language checker module
|
||||
# # TODO: actually implement this (should be similar)
|
||||
# import cipheydists
|
||||
#
|
||||
# # Now we fill in the params *shudder*
|
||||
# for i in args["param"]:
|
||||
# key, value = i.split("=", 1)
|
||||
# parent, name = key.split(".", 1)
|
||||
# config.update_param(parent, name, value)
|
||||
#
|
||||
# # Now we have parsed and loaded everything else, we can load the objects
|
||||
# config.load_objs()
|
||||
#
|
||||
# return args
|
||||
#
|
||||
|
||||
def get_name(ctx, param, value):
|
||||
# reads from stdin if the argument wasnt supplied
|
||||
|
@ -323,105 +157,54 @@ def get_name(ctx, param, value):
|
|||
return locals()
|
||||
|
||||
|
||||
def arg_parsing(args) -> Optional[dict]:
|
||||
"""This function parses arguments.
|
||||
|
||||
Args:
|
||||
None
|
||||
Returns:
|
||||
The config to be passed around for the rest of time
|
||||
"""
|
||||
# the below text does:
|
||||
# if -t is supplied, use that
|
||||
# if ciphey is called like:
|
||||
# ciphey 'encrypted text' use that
|
||||
# else if data is piped like:
|
||||
# echo 'hello' | ciphey use that
|
||||
# if no data is supplied, no arguments supplied.
|
||||
text = None
|
||||
if args["text"] is not None:
|
||||
text = args["text"]
|
||||
else:
|
||||
print("No input given.")
|
||||
exit(1)
|
||||
|
||||
if len(sys.argv) == 1:
|
||||
print("No arguments were supplied. Look at the help menu with -h or --help")
|
||||
return None
|
||||
|
||||
args["text"] = text
|
||||
if len(args["text"]) < 3:
|
||||
print("A string of less than 3 chars cannot be interpreted by Ciphey.")
|
||||
return None
|
||||
|
||||
config = dict()
|
||||
|
||||
# Now we can walk through the arguments, expanding them into a canonical form
|
||||
#
|
||||
# First, we go over simple args
|
||||
config["info"] = False
|
||||
config["ctext"] = args["text"]
|
||||
config["grep"] = args["greppable"]
|
||||
config["offline"] = args["offline"]
|
||||
if args["verbose"] >= 3:
|
||||
config["debug"] = "TRACE"
|
||||
elif args["verbose"] == 2:
|
||||
config["debug"] = "DEBUG"
|
||||
elif args["verbose"] == 1:
|
||||
config["debug"] = "ERROR"
|
||||
else:
|
||||
config["debug"] = "WARNING"
|
||||
# Try to locate language checker module
|
||||
# TODO: actually implement this
|
||||
from ciphey.LanguageChecker.brandon import ciphey_language_checker as brandon
|
||||
|
||||
config["checker"] = brandon
|
||||
# Try to locate language checker module
|
||||
# TODO: actually implement this (should be similar)
|
||||
import cipheydists
|
||||
|
||||
config["wordlist"] = set(cipheydists.get_list("english"))
|
||||
# Now we fill in the params *shudder*
|
||||
config["params"] = {}
|
||||
return config
|
||||
|
||||
|
||||
@click.command()
|
||||
@click.option(
|
||||
"-t", "--text", help="The ciphertext you want to decrypt.", type=str,
|
||||
)
|
||||
@click.option(
|
||||
"-g",
|
||||
"--greppable",
|
||||
help="Only output the answer. Useful for grep.",
|
||||
"-i",
|
||||
"--info",
|
||||
help="Do you want information on the cipher used?",
|
||||
type=bool,
|
||||
is_flag=True,
|
||||
)
|
||||
@click.option(
|
||||
"-q",
|
||||
"--quiet",
|
||||
help="Decrease verbosity",
|
||||
type=int,
|
||||
count=True,
|
||||
default=None
|
||||
)
|
||||
@click.option(
|
||||
"-g",
|
||||
"--greppable",
|
||||
help="Only print the answer (useful for grep)",
|
||||
type=bool,
|
||||
is_flag=True,
|
||||
default=None
|
||||
)
|
||||
@click.option("-v", "--verbose", count=True, type=int)
|
||||
@click.option(
|
||||
"-a",
|
||||
"-C",
|
||||
"--checker",
|
||||
help="Use the default internal checker. Defaults to brandon",
|
||||
type=bool,
|
||||
help="Use the given checker",
|
||||
default=None
|
||||
)
|
||||
@click.option(
|
||||
"-A",
|
||||
"--checker-path",
|
||||
help="Uses the language checker at the given path",
|
||||
type=click.Path(exists=True),
|
||||
"-c",
|
||||
"--config",
|
||||
help="Uses the given config file. Defaults to appdirs.user_config_dir('ciphey', 'ciphey')/'config.yml'",
|
||||
)
|
||||
@click.option("-w", "--wordlist", help="Uses the given internal wordlist")
|
||||
@click.option("-w", "--wordlist", help="Uses the given wordlist")
|
||||
@click.option(
|
||||
"-W",
|
||||
"--wordlist-file",
|
||||
help="Uses the wordlist at the given path",
|
||||
type=click.File("rb"),
|
||||
"-p",
|
||||
"--param",
|
||||
help="Passes a parameter to the language checker",
|
||||
multiple=True,
|
||||
)
|
||||
@click.option(
|
||||
"-p", "--param", help="Passes a parameter to the language checker", type=str
|
||||
)
|
||||
@click.option(
|
||||
"-l", "--list-params", help="List the parameters of the selected module", type=str,
|
||||
"-l", "--list-params", help="List the parameters of the selected module", type=bool,
|
||||
)
|
||||
@click.option(
|
||||
"-O",
|
||||
|
@ -430,23 +213,49 @@ def arg_parsing(args) -> Optional[dict]:
|
|||
type=bool,
|
||||
is_flag=True,
|
||||
)
|
||||
@click.option(
|
||||
"--searcher",
|
||||
help="Select the searching algorithm to use",
|
||||
)
|
||||
# HARLAN TODO XXX
|
||||
# I switched this to a boolean flag system
|
||||
# https://click.palletsprojects.com/en/7.x/options/#boolean-flags
|
||||
# True for bytes input, False for str
|
||||
@click.option(
|
||||
"-b",
|
||||
"--bytes-input",
|
||||
help="Forces ciphey to use binary mode for the input. Rather experimental and may break things!",
|
||||
is_flag=True,
|
||||
default=None
|
||||
)
|
||||
# HARLAN TODO XXX
|
||||
# I switched this to a boolean flag system
|
||||
# https://click.palletsprojects.com/en/7.x/options/#boolean-flags
|
||||
@click.option(
|
||||
"-B",
|
||||
"--bytes-output",
|
||||
help="Forces ciphey to use binary mode for the output. Rather experimental and may break things!",
|
||||
is_flag=True,
|
||||
default=None
|
||||
)
|
||||
@click.option(
|
||||
"--default-dist",
|
||||
help="Sets the default character/byte distribution",
|
||||
type=str,
|
||||
default=None
|
||||
)
|
||||
@click.option(
|
||||
"-m", "--module", help="Adds a module from the given path", type=click.Path(), multiple=True,
|
||||
)
|
||||
@click.option(
|
||||
"-A",
|
||||
"--appdirs",
|
||||
help="Print the location of where Ciphey wants the settings file to be",
|
||||
type=bool
|
||||
)
|
||||
@click.argument("text_stdin", callback=get_name, required=False)
|
||||
@click.argument("file_stdin", type=click.File("rb"), required=False)
|
||||
def main(
|
||||
text,
|
||||
greppable,
|
||||
verbose,
|
||||
checker,
|
||||
checker_path,
|
||||
wordlist,
|
||||
wordlist_file,
|
||||
param,
|
||||
list_params,
|
||||
offline,
|
||||
text_stdin,
|
||||
file_stdin,
|
||||
config: Dict[str, object] = None,
|
||||
) -> Optional[dict]:
|
||||
def main(**kwargs) -> Optional[dict]:
|
||||
"""Ciphey - Automated Decryption Tool
|
||||
|
||||
Documentation:
|
||||
|
@ -459,7 +268,7 @@ def main(
|
|||
Ciphey is an automated decryption tool using smart artificial intelligence and natural language processing. Input encrypted text, get the decrypted text back.
|
||||
|
||||
Examples:\n
|
||||
Basic Usage: ciphey -t "aGVsbG8gbXkgbmFtZSBpcyBiZWU="
|
||||
Basic Usage: ciphey -t "aGVsbG8gbXkgbmFtZSBpcyBiZWU=" -d true -c true
|
||||
|
||||
"""
|
||||
|
||||
|
@ -476,46 +285,90 @@ def main(
|
|||
The output of the decryption.
|
||||
"""
|
||||
|
||||
if config is None:
|
||||
config = locals()
|
||||
|
||||
if config["text"] is None:
|
||||
if file_stdin is not None:
|
||||
config["text"] = file_stdin.read().decode("utf-8")
|
||||
elif text_stdin is not None:
|
||||
config["text"] = text_stdin
|
||||
else:
|
||||
print("No inputs were given to Ciphey. Run ciphey --help")
|
||||
return None
|
||||
# if user wants to know where appdirs is
|
||||
# print and exit
|
||||
if kwargs["appdirs"] is not None:
|
||||
import appdirs
|
||||
appname = "ciphey"
|
||||
return None
|
||||
|
||||
config = arg_parsing(config)
|
||||
# Check if we errored out
|
||||
if config is None:
|
||||
# Now we create the config object
|
||||
config = iface.Config()
|
||||
|
||||
# Default init the config object
|
||||
config = iface.Config()
|
||||
|
||||
# Load the settings file into the config
|
||||
cfg_arg = kwargs["config"]
|
||||
if cfg_arg is None:
|
||||
# Make sure that the config dir actually exists
|
||||
os.makedirs(iface.Config.get_default_dir(), exist_ok=True)
|
||||
config.load_file(create=True)
|
||||
else:
|
||||
config.load_file(cfg_arg)
|
||||
|
||||
# Load the verbosity, so that we can start logging
|
||||
verbosity = kwargs["verbose"]
|
||||
quiet = kwargs["quiet"]
|
||||
if verbosity is None:
|
||||
if quiet is not None:
|
||||
verbosity = -quiet
|
||||
elif quiet is not None:
|
||||
verbosity -= quiet
|
||||
if kwargs["greppable"] is not None:
|
||||
verbosity -= 999
|
||||
# Use the existing value as a base
|
||||
config.verbosity += verbosity
|
||||
config.update_log_level(config.verbosity)
|
||||
logger.trace(f"Got cmdline args {kwargs}")
|
||||
|
||||
# Now we load the modules
|
||||
module_arg = kwargs["module"]
|
||||
if module_arg is not None:
|
||||
config.modules += list(module_arg)
|
||||
config.load_modules()
|
||||
|
||||
# We need to load formats BEFORE we instantiate objects
|
||||
if kwargs["bytes_input"] is not None:
|
||||
config.update_format("in", "bytes")
|
||||
|
||||
output_format = kwargs["bytes_output"]
|
||||
if kwargs["bytes_output"] is not None:
|
||||
config.update_format("in", "bytes")
|
||||
|
||||
# Next, load the objects
|
||||
params = kwargs["param"]
|
||||
if params is not None:
|
||||
for i in params:
|
||||
key, value = i.split("=", 1)
|
||||
parent, name = key.split(".", 1)
|
||||
config.update_param(parent, name, value)
|
||||
config.update("checker", kwargs["checker"])
|
||||
config.update("searcher", kwargs["searcher"])
|
||||
config.update("default_dist", kwargs["default_dist"])
|
||||
config.load_objs()
|
||||
|
||||
logger.trace(f"Config finalised: {config}")
|
||||
|
||||
# Finally, we load the plaintext
|
||||
if kwargs["text"] is None:
|
||||
if kwargs["file_stdin"] is not None:
|
||||
kwargs["text"] = kwargs["file_stdin"].read().decode("utf-8")
|
||||
elif kwargs["text_stdin"] is not None:
|
||||
kwargs["text"] = kwargs["text_stdin"]
|
||||
else:
|
||||
print("No inputs were given to Ciphey. For usage, run ciphey --help")
|
||||
logger.critical("No text input given!")
|
||||
return None
|
||||
|
||||
return main_decrypt(config)
|
||||
|
||||
# Now we have working arguments, we can expand it and pass it to the Ciphey constructor
|
||||
|
||||
|
||||
def main_decrypt(config: Dict[str, object] = None) -> Optional[dict]:
|
||||
"""Calls the decrypt, acts as a 2nd main
|
||||
|
||||
The problem is that Click fails to run when importing and using main()
|
||||
|
||||
If I make a new function for Click, I have to change so much just to make it work.
|
||||
|
||||
If I make a new function for using the default config, and acting as a 2nd main -- I have to change less
|
||||
Thus, this function exists."""
|
||||
if config is None:
|
||||
print("No config file.")
|
||||
exit(1)
|
||||
|
||||
cipher_obj = Ciphey(config)
|
||||
return cipher_obj.decrypt()
|
||||
print(decrypt(config, kwargs["text"]))
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
# withArgs because this function is only called
|
||||
# if the program is run in terminal
|
||||
main()
|
||||
#with click_spinner.spinner():
|
||||
# result = main()
|
||||
result = main()
|
||||
if result is not None:
|
||||
print(result)
|
||||
|
|
|
@ -0,0 +1 @@
|
|||
from . import quorum, regex, brandon
|
|
@ -0,0 +1,313 @@
|
|||
"""
|
||||
██████╗██╗██████╗ ██╗ ██╗███████╗██╗ ██╗
|
||||
██╔════╝██║██╔══██╗██║ ██║██╔════╝╚██╗ ██╔╝
|
||||
██║ ██║██████╔╝███████║█████╗ ╚████╔╝
|
||||
██║ ██║██╔═══╝ ██╔══██║██╔══╝ ╚██╔╝
|
||||
╚██████╗██║██║ ██║ ██║███████╗ ██║
|
||||
© Brandon Skerritt
|
||||
Github: brandonskerritt
|
||||
|
||||
Class to determine whether somethine is English or not.
|
||||
1. Calculate the Chi Squared score of a sentence
|
||||
2. If the score is significantly lower than the average score, it _might_ be English
|
||||
2.1. If the score _might_ be English, then take the text and compare it to the sorted dictionary
|
||||
in O(n log n) time.
|
||||
It creates a percentage of "How much of this text is in the dictionary?"
|
||||
The dictionary contains:
|
||||
* 20,000 most common US words
|
||||
* 10,000 most common UK words (there's no repition between the two)
|
||||
* The top 10,000 passwords
|
||||
If the word "Looks like" English (chi-squared) and if it contains English words, we can conclude it is
|
||||
very likely English. The alternative is doing the dictionary thing but with an entire 479k word dictionary (slower)
|
||||
2.2. If the score is not English, but we haven't tested enough to create an average, then test it against
|
||||
the dictionary
|
||||
|
||||
Things to optimise:
|
||||
* We only run the dictionary if it's 20% smaller than the average for chi squared
|
||||
* We consider it "English" if 45% of the text matches the dictionary
|
||||
* We run the dictionary if there is less than 10 total chisquared test
|
||||
|
||||
How to add a language:
|
||||
* Download your desired dictionary. Try to make it the most popular words, for example. Place this file into this
|
||||
folder with languagename.txt
|
||||
As an example, this comes built in with english.txt
|
||||
Find the statistical frequency of each letter in that language.
|
||||
For English, we have:
|
||||
self.languages = {
|
||||
"English":
|
||||
[0.0855, 0.0160, 0.0316, 0.0387, 0.1210,0.0218, 0.0209, 0.0496, 0.0733, 0.0022,0.0081, 0.0421, 0.0253, 0.0717,
|
||||
0.0747,0.0207, 0.0010, 0.0633, 0.0673, 0.0894,0.0268, 0.0106, 0.0183, 0.0019, 0.0172,0.0011]
|
||||
}
|
||||
In chisquared.py
|
||||
To add your language, do:
|
||||
self.languages = {
|
||||
"English":
|
||||
[0.0855, 0.0160, 0.0316, 0.0387, 0.1210,0.0218, 0.0209, 0.0496, 0.0733, 0.0022,0.0081, 0.0421, 0.0253, 0.0717,
|
||||
0.0747,0.0207, 0.0010, 0.0633, 0.0673, 0.0894,0.0268, 0.0106, 0.0183, 0.0019, 0.0172,0.0011]
|
||||
"German": [0.0973]
|
||||
}
|
||||
In alphabetical order
|
||||
And you're.... Done! Make sure the name of the two match up
|
||||
"""
|
||||
from typing import Dict, Set, Optional, Any
|
||||
import ciphey
|
||||
from string import punctuation
|
||||
|
||||
from loguru import logger
|
||||
|
||||
import string
|
||||
import os
|
||||
import sys
|
||||
from loguru import logger
|
||||
from math import ceil
|
||||
|
||||
from ciphey.iface import T, registry
|
||||
|
||||
sys.path.append("..")
|
||||
try:
|
||||
import mathsHelper as mh
|
||||
except ModuleNotFoundError:
|
||||
import ciphey.mathsHelper as mh
|
||||
|
||||
|
||||
@registry.register
|
||||
class Brandon(ciphey.iface.Checker[str]):
|
||||
"""
|
||||
Class designed to confirm whether something is **language** based on how many words of **language** appears
|
||||
Call confirmLanguage(text, language)
|
||||
* text: the text you want to confirm
|
||||
* language: the language you want to confirm
|
||||
|
||||
Find out what language it is by using chisquared.py, the highest chisquared score is the language
|
||||
languageThreshold = 45
|
||||
if a string is 45% **language** words, then it's confirmed to be english
|
||||
"""
|
||||
|
||||
def getExpectedRuntime(self, text: T) -> float:
|
||||
# TODO: actually work this out
|
||||
return 1e-4 # 100 µs
|
||||
|
||||
wordlist: set
|
||||
|
||||
def clean_text(self, text: str) -> set:
|
||||
"""Cleans the text ready to be checked
|
||||
|
||||
Strips punctuation, makes it lower case, turns it into a set separated by spaces, removes duplicate words
|
||||
|
||||
Args:
|
||||
text -> The text we use to perform analysis on
|
||||
|
||||
Returns:
|
||||
text -> the text as a list, now cleaned
|
||||
|
||||
"""
|
||||
# makes the text unique words and readable
|
||||
text = text.lower()
|
||||
text = self.mh.strip_puncuation(text)
|
||||
text = text.split(" ")
|
||||
text = set(text)
|
||||
return text
|
||||
|
||||
x = []
|
||||
for word in text:
|
||||
# poor mans lemisation
|
||||
# removes 's from the dict'
|
||||
if word.endswith("'s"):
|
||||
x.append(word[0:-2])
|
||||
text = self.mh.strip_puncuation(x)
|
||||
# turns it all into lowercase and as a set
|
||||
complete = set([word.lower() for word in x])
|
||||
|
||||
return complete
|
||||
|
||||
def checker(self, text: str, threshold: float, text_length: int, var: set) -> bool:
|
||||
"""Given text determine if it passes checker
|
||||
|
||||
The checker uses the vairable passed to it. I.E. Stopwords list, 1k words, dictionary
|
||||
|
||||
Args:
|
||||
text -> The text to check
|
||||
threshold -> at what point do we return True? The percentage of text that is in var before we return True
|
||||
text_length -> the length of the text
|
||||
var -> the variable we are checking against. Stopwords list, 1k words list, dictionray list.
|
||||
Returns:
|
||||
boolean -> True for it passes the test, False for it fails the test."""
|
||||
if text is None:
|
||||
logger.trace(f"Checker's text is None, so returning False")
|
||||
return False
|
||||
if var is None:
|
||||
logger.trace(f"Checker's input var is None, so returning False")
|
||||
return False
|
||||
|
||||
percent = ceil(text_length * threshold)
|
||||
logger.trace(f"Checker's chunks are size {percent}")
|
||||
meet_threshold = 0
|
||||
location = 0
|
||||
end = percent
|
||||
|
||||
while location <= text_length:
|
||||
# chunks the text, so only gets THRESHOLD chunks of text at a time
|
||||
text = list(text)
|
||||
to_analyse = text[location:end]
|
||||
logger.trace(f"To analyse is {to_analyse}")
|
||||
for word in to_analyse:
|
||||
# if word is a stopword, + 1 to the counter
|
||||
if word in var:
|
||||
logger.trace(
|
||||
f"{word} is in var, which means I am +=1 to the meet_threshold which is {meet_threshold}"
|
||||
)
|
||||
meet_threshold += 1
|
||||
meet_threshold_percent = meet_threshold / text_length
|
||||
if meet_threshold_percent >= threshold:
|
||||
logger.trace(
|
||||
f"Returning true since the percentage is {meet_threshold / text_length} and the threshold is {threshold}"
|
||||
)
|
||||
# if we meet the threshold, return True
|
||||
# otherwise, go over again until we do
|
||||
# We do this in the for loop because if we're at 24% and THRESHOLD is 25
|
||||
# we don't want to wait THRESHOLD to return true, we want to return True ASAP
|
||||
return True
|
||||
location = end
|
||||
end = end + percent
|
||||
logger.trace(
|
||||
f"The language proportion {meet_threshold_percent} is under the threshold {threshold}"
|
||||
)
|
||||
return False
|
||||
|
||||
def __init__(self, config: ciphey.iface.Config):
|
||||
# Suppresses warning
|
||||
super().__init__(config)
|
||||
self.mh = mh.mathsHelper()
|
||||
|
||||
phases = config.get_resource(self._params()["phases"])
|
||||
|
||||
self.thresholds_phase1 = phases["1"]
|
||||
self.thresholds_phase2 = phases["2"]
|
||||
self.top1000Words = config.get_resource(self._params().get("top1000"))
|
||||
self.wordlist = config.get_resource(self._params()["wordlist"])
|
||||
self.stopwords = config.get_resource(self._params().get("stopwords"))
|
||||
|
||||
self.len_phase1 = len(self.thresholds_phase1)
|
||||
self.len_phase2 = len(self.thresholds_phase2)
|
||||
|
||||
def check(self, text: str) -> Optional[str]:
|
||||
"""Checks to see if the text is in English
|
||||
|
||||
Performs a decryption, but mainly parses the internal data packet and prints useful information.
|
||||
|
||||
Args:
|
||||
text -> The text we use to perform analysis on
|
||||
|
||||
Returns:
|
||||
bool -> True if the text is English, False otherwise.
|
||||
|
||||
"""
|
||||
logger.trace(f'In Language Checker with "{text}"')
|
||||
text = self.clean_text(text)
|
||||
logger.trace(f'Text split to "{text}"')
|
||||
if text == "":
|
||||
return None
|
||||
|
||||
length_text = len(text)
|
||||
|
||||
# "Phase 1": {0: {"check": 0.02}, 110: {"stop": 0.15}, 150: {"stop": 0.28}}
|
||||
|
||||
# Phase 1 checking
|
||||
|
||||
what_to_use = {}
|
||||
|
||||
# this code decides what checker / threshold to use
|
||||
# if text is over or equal to maximum size, just use the maximum possible checker
|
||||
what_to_use = self.calculateWhatChecker(
|
||||
length_text, self.thresholds_phase1.keys()
|
||||
)
|
||||
logger.trace(f"What to use is {what_to_use}")
|
||||
logger.trace(self.thresholds_phase1)
|
||||
what_to_use = self.thresholds_phase1[str(what_to_use)]
|
||||
# def checker(self, text: str, threshold: float, text_length: int, var: set) -> bool:
|
||||
if "check" in what_to_use:
|
||||
# perform check 1k words
|
||||
result = self.checker(
|
||||
text, what_to_use["check"], length_text, self.top1000Words
|
||||
)
|
||||
logger.trace(f"The result from check 1k words is {result}")
|
||||
elif "stop" in what_to_use:
|
||||
# perform stopwords
|
||||
result = self.checker(
|
||||
text, what_to_use["stop"], length_text, self.stopwords
|
||||
)
|
||||
logger.trace(f"The result from check stopwords is {result}")
|
||||
else:
|
||||
logger.debug(f"It is neither stop or check, but instead {what_to_use}")
|
||||
|
||||
# return False if phase 1 fails
|
||||
if not result:
|
||||
return None
|
||||
else:
|
||||
what_to_use = self.calculateWhatChecker(
|
||||
length_text, self.thresholds_phase2.keys()
|
||||
)
|
||||
what_to_use = self.thresholds_phase2[str(what_to_use)]
|
||||
result = self.checker(
|
||||
text, what_to_use["dict"], length_text, self.wordlist
|
||||
)
|
||||
logger.trace(f"Result of dictionary checker is {result}")
|
||||
return "" if result else None
|
||||
|
||||
def calculateWhatChecker(self, length_text, key):
|
||||
"""Calculates what threshold / checker to use
|
||||
|
||||
If the length of the text is over the maximum sentence length, use the last checker / threshold
|
||||
Otherwise, traverse the keys backwards until we find a key range that does not fit.
|
||||
So we traverse backwards and see if the sentence length is between current - 1 and current
|
||||
In this way, we find the absolute lowest checker / percentage threshold.
|
||||
We traverse backwards because if the text is longer than the max sentence length, we already know.
|
||||
In total, the keys are only 5 items long or so. It is not expensive to move backwards, nor is it expensive to move forwards.
|
||||
|
||||
Args:
|
||||
length_text -> The length of the text
|
||||
key -> What key we want to use. I.E. Phase1 keys, Phase2 keys.
|
||||
Returns:
|
||||
what_to_use -> the key of the lowest checker."""
|
||||
|
||||
_keys = list(key)
|
||||
_keys = list(map(int, _keys))
|
||||
if length_text >= int(_keys[-1]):
|
||||
what_to_use = key[_keys[-1]]
|
||||
else:
|
||||
# this algorithm finds the smallest possible fit for the text
|
||||
for counter, i in reversed(list(enumerate(_keys))):
|
||||
if counter != 0:
|
||||
if _keys[counter - 1] <= length_text <= i:
|
||||
what_to_use = i
|
||||
return what_to_use
|
||||
|
||||
@staticmethod
|
||||
def getParams() -> Optional[Dict[str, ciphey.iface.ParamSpec]]:
|
||||
return {
|
||||
"top1000": ciphey.iface.ParamSpec(
|
||||
desc="A wordlist of the top 1000 words",
|
||||
req=False,
|
||||
default="cipheydists::list::english1000",
|
||||
),
|
||||
"wordlist": ciphey.iface.ParamSpec(
|
||||
desc="A wordlist of all the words",
|
||||
req=False,
|
||||
default="cipheydists::list::english",
|
||||
),
|
||||
"stopwords": ciphey.iface.ParamSpec(
|
||||
desc="A wordlist of StopWords",
|
||||
req=False,
|
||||
default="cipheydists::list::englishStopWords",
|
||||
),
|
||||
"threshold": ciphey.iface.ParamSpec(
|
||||
desc="The minimum proportion (between 0 and 1) that must be in the dictionary",
|
||||
req=False,
|
||||
default=0.45,
|
||||
),
|
||||
"phases": ciphey.iface.ParamSpec(
|
||||
desc="Language-specific phase thresholds",
|
||||
req=False,
|
||||
default="cipheydists::brandon::english",
|
||||
),
|
||||
}
|
|
@ -0,0 +1,49 @@
|
|||
from math import ceil
|
||||
from typing import Optional, Dict, Generic
|
||||
|
||||
import ciphey
|
||||
from ciphey.iface import ParamSpec, Config, T
|
||||
|
||||
|
||||
class Quorum(Generic[T], ciphey.iface.Checker[T]):
|
||||
def check(self, text: T) -> Optional[str]:
|
||||
left = self._params().k
|
||||
results = []
|
||||
for checker in self.checkers:
|
||||
results.append(checker.check(text))
|
||||
if results[-1] is None:
|
||||
continue
|
||||
left -= 1
|
||||
# Early return check
|
||||
if left == 0:
|
||||
return str(results)
|
||||
|
||||
def __init__(self, config: Config):
|
||||
super().__init__(config)
|
||||
|
||||
if self._params().k is None:
|
||||
k = len(self._params()["checker"])
|
||||
# These checks need to be separate, to make sure that we do not have zero members
|
||||
if self._params().k == 0 or self._params().k > len(self._params()["checker"]):
|
||||
raise IndexError(
|
||||
"k must be between 0 and the number of checkers (inclusive)"
|
||||
)
|
||||
|
||||
self.checkers = []
|
||||
for i in self._params()["checker"]:
|
||||
# This enforces type consistency
|
||||
self.checkers.append(
|
||||
ciphey.iface._registry.get_named(i, ciphey.iface.Checker[T])
|
||||
)
|
||||
|
||||
@staticmethod
|
||||
def getParams() -> Optional[Dict[str, ParamSpec]]:
|
||||
return {
|
||||
"checker": ParamSpec(
|
||||
req=True, desc="The checkers to be used for analysis", list=True
|
||||
),
|
||||
"k": ParamSpec(
|
||||
req=False,
|
||||
desc="The minimum quorum size. Defaults to the number of checkers",
|
||||
),
|
||||
}
|
|
@ -0,0 +1,40 @@
|
|||
from typing import Optional, Dict
|
||||
|
||||
import ciphey
|
||||
import re
|
||||
from ciphey.iface import ParamSpec, T, Config, registry
|
||||
|
||||
from loguru import logger
|
||||
|
||||
|
||||
@registry.register
|
||||
class Regex(ciphey.iface.Checker[str]):
|
||||
def getExpectedRuntime(self, text: T) -> float:
|
||||
return 1e-5 # TODO: actually calculate this
|
||||
|
||||
def __init__(self, config: Config):
|
||||
super().__init__(config)
|
||||
self.regexes = list(map(re.compile, self._params()["regex"]))
|
||||
logger.trace(f"There are {len(self.regexes)} regexes")
|
||||
|
||||
def check(self, text: str) -> Optional[str]:
|
||||
for regex in self.regexes:
|
||||
logger.trace(f"Trying regex {regex} on {text}")
|
||||
res = regex.search(text)
|
||||
logger.trace(f"Results: {res}")
|
||||
if res:
|
||||
return f"passed with regex {regex}"
|
||||
|
||||
@staticmethod
|
||||
def getParams() -> Optional[Dict[str, ParamSpec]]:
|
||||
return {
|
||||
"regex": ParamSpec(
|
||||
req=True,
|
||||
desc="The regex that must be matched (in a substring)",
|
||||
list=True,
|
||||
)
|
||||
}
|
||||
|
||||
@staticmethod
|
||||
def getName() -> str:
|
||||
return "regex"
|
|
@ -0,0 +1 @@
|
|||
from . import caesar, vigenere
|
|
@ -0,0 +1,112 @@
|
|||
"""
|
||||
██████╗██╗██████╗ ██╗ ██╗███████╗██╗ ██╗
|
||||
██╔════╝██║██╔══██╗██║ ██║██╔════╝╚██╗ ██╔╝
|
||||
██║ ██║██████╔╝███████║█████╗ ╚████╔╝
|
||||
██║ ██║██╔═══╝ ██╔══██║██╔══╝ ╚██╔╝
|
||||
╚██████╗██║██║ ██║ ██║███████╗ ██║
|
||||
© Brandon Skerritt
|
||||
Github: brandonskerritt
|
||||
"""
|
||||
from distutils import util
|
||||
from typing import Optional, Dict, Union, Set, List
|
||||
|
||||
from loguru import logger
|
||||
import ciphey
|
||||
import cipheycore
|
||||
|
||||
from ciphey.iface import ParamSpec, CrackResult, T, CrackInfo, registry
|
||||
|
||||
@registry.register
|
||||
class Caesar(ciphey.iface.Cracker[str]):
|
||||
def getInfo(self, ctext: T) -> CrackInfo:
|
||||
analysis = self.cache.get_or_update(
|
||||
ctext,
|
||||
"cipheycore::simple_analysis",
|
||||
lambda: cipheycore.analyse_string(ctext),
|
||||
)
|
||||
|
||||
return CrackInfo(
|
||||
success_likelihood=cipheycore.caesar_detect(analysis, self.expected),
|
||||
# TODO: actually calculate runtimes
|
||||
success_runtime=1e-4,
|
||||
failure_runtime=1e-4,
|
||||
)
|
||||
|
||||
@staticmethod
|
||||
def getTarget() -> str:
|
||||
return "caesar"
|
||||
|
||||
def attemptCrack(self, ctext: str) -> List[CrackResult]:
|
||||
logger.debug("Trying caesar cipher")
|
||||
# Convert it to lower case
|
||||
#
|
||||
# TODO: handle different alphabets
|
||||
if self.lower:
|
||||
message = ctext.lower()
|
||||
else:
|
||||
message = ctext
|
||||
|
||||
logger.trace("Beginning cipheycore simple analysis")
|
||||
|
||||
# Hand it off to the core
|
||||
analysis = self.cache.get_or_update(
|
||||
ctext,
|
||||
"cipheycore::simple_analysis",
|
||||
lambda: cipheycore.analyse_string(message),
|
||||
)
|
||||
logger.trace("Beginning cipheycore::caesar")
|
||||
possible_keys = cipheycore.caesar_crack(
|
||||
analysis, self.expected, self.group, True, self.p_value
|
||||
)
|
||||
n_candidates = len(possible_keys)
|
||||
logger.debug(f"Caesar returned {n_candidates} candidates")
|
||||
|
||||
candidates = []
|
||||
|
||||
for candidate in possible_keys:
|
||||
translated = cipheycore.caesar_decrypt(message, candidate.key, self.group)
|
||||
candidates.append(CrackResult(value=translated, key_info=candidate.key))
|
||||
|
||||
return candidates
|
||||
|
||||
|
||||
|
||||
@staticmethod
|
||||
def getParams() -> Optional[Dict[str, ParamSpec]]:
|
||||
return {
|
||||
"expected": ciphey.iface.ParamSpec(
|
||||
desc="The expected distribution of the plaintext",
|
||||
req=False,
|
||||
config_ref=["default_dist"],
|
||||
),
|
||||
"group": ciphey.iface.ParamSpec(
|
||||
desc="An ordered sequence of chars that make up the caesar cipher alphabet",
|
||||
req=False,
|
||||
default="abcdefghijklmnopqrstuvwxyz",
|
||||
),
|
||||
"lower": ciphey.iface.ParamSpec(
|
||||
desc="Whether or not the ciphertext should be converted to lowercase first",
|
||||
req=False,
|
||||
default=True,
|
||||
),
|
||||
"p_value": ciphey.iface.ParamSpec(
|
||||
desc="The p-value to use for standard frequency analysis",
|
||||
req=False,
|
||||
default=0.1,
|
||||
)
|
||||
# TODO: add "filter" param
|
||||
}
|
||||
|
||||
@staticmethod
|
||||
def scoreUtility() -> float:
|
||||
return 1.5
|
||||
|
||||
def __init__(self, config: ciphey.iface.Config):
|
||||
super().__init__(config)
|
||||
self.lower: Union[str, bool] = self._params()["lower"]
|
||||
if type(self.lower) != bool:
|
||||
self.lower = util.strtobool(self.lower)
|
||||
self.group = list(self._params()["group"])
|
||||
self.expected = config.get_resource(self._params()["expected"])
|
||||
self.cache = config.cache
|
||||
self.p_value = self._params()["p_value"]
|
|
@ -0,0 +1,251 @@
|
|||
"""
|
||||
██████╗██╗██████╗ ██╗ ██╗███████╗██╗ ██╗
|
||||
██╔════╝██║██╔══██╗██║ ██║██╔════╝╚██╗ ██╔╝
|
||||
██║ ██║██████╔╝███████║█████╗ ╚████╔╝
|
||||
██║ ██║██╔═══╝ ██╔══██║██╔══╝ ╚██╔╝
|
||||
╚██████╗██║██║ ██║ ██║███████╗ ██║
|
||||
© Brandon Skerritt
|
||||
Github: brandonskerritt
|
||||
"""
|
||||
from distutils import util
|
||||
from typing import Optional, Dict, Union, Set, List
|
||||
|
||||
import re
|
||||
|
||||
from loguru import logger
|
||||
import ciphey
|
||||
import cipheycore
|
||||
|
||||
from ciphey.iface import ParamSpec, Cracker, CrackResult, T, CrackInfo, registry
|
||||
|
||||
|
||||
@registry.register
|
||||
class Vigenere(ciphey.iface.Cracker[str]):
|
||||
def getInfo(self, ctext: T) -> CrackInfo:
|
||||
if self.keysize is not None:
|
||||
analysis = self.cache.get_or_update(
|
||||
ctext,
|
||||
f"vigenere::{self.keysize}",
|
||||
lambda: cipheycore.analyse_string(ctext, self.keysize, self.group),
|
||||
)
|
||||
|
||||
return CrackInfo(
|
||||
success_likelihood=cipheycore.vigenere_detect(analysis, self.expected),
|
||||
# TODO: actually calculate runtimes
|
||||
success_runtime=1e-4,
|
||||
failure_runtime=1e-4,
|
||||
)
|
||||
else:
|
||||
return CrackInfo(
|
||||
success_likelihood=0.5, # TODO: actually work this out
|
||||
# TODO: actually calculate runtimes
|
||||
success_runtime=1e-4,
|
||||
failure_runtime=1e-4,
|
||||
)
|
||||
|
||||
@staticmethod
|
||||
def getTarget() -> str:
|
||||
return "vigenere"
|
||||
|
||||
def crackOne(
|
||||
self, ctext: str, analysis: cipheycore.windowed_analysis_res
|
||||
) -> List[CrackResult]:
|
||||
possible_keys = cipheycore.vigenere_crack(
|
||||
analysis, self.expected, self.group, self.p_value
|
||||
)
|
||||
return [
|
||||
CrackResult(
|
||||
value=cipheycore.vigenere_decrypt(ctext, candidate.key, self.group),
|
||||
key_info="".join([self.group[i] for i in candidate.key]),
|
||||
)
|
||||
for candidate in possible_keys
|
||||
]
|
||||
|
||||
def attemptCrack(self, ctext: str) -> List[CrackResult]:
|
||||
logger.debug("Trying vigenere cipher")
|
||||
# Convert it to lower case
|
||||
if self.lower:
|
||||
message = ctext.lower()
|
||||
else:
|
||||
message = ctext
|
||||
|
||||
# Analysis must be done here, where we know the case for the cache
|
||||
if self.keysize is not None:
|
||||
return self.crackOne(
|
||||
message,
|
||||
self.cache.get_or_update(
|
||||
ctext,
|
||||
f"vigenere::{self.keysize}",
|
||||
lambda: cipheycore.analyse_string(ctext, self.keysize, self.group),
|
||||
),
|
||||
)
|
||||
else:
|
||||
arrs = []
|
||||
possible_len = self.kasiskiExamination(message)
|
||||
possible_len.sort()
|
||||
logger.trace(f"Got possible lengths {possible_len}")
|
||||
# TODO: work out length
|
||||
for i in possible_len:
|
||||
arrs.extend(
|
||||
self.crackOne(
|
||||
message,
|
||||
self.cache.get_or_update(
|
||||
ctext,
|
||||
f"vigenere::{i}",
|
||||
lambda: cipheycore.analyse_string(ctext, i, self.group),
|
||||
),
|
||||
)
|
||||
)
|
||||
|
||||
logger.debug(f"Vigenere returned {len(arrs)} candidates")
|
||||
return arrs
|
||||
|
||||
@staticmethod
|
||||
def getParams() -> Optional[Dict[str, ParamSpec]]:
|
||||
return {
|
||||
"expected": ciphey.iface.ParamSpec(
|
||||
desc="The expected distribution of the plaintext",
|
||||
req=False,
|
||||
config_ref=["default_dist"],
|
||||
),
|
||||
"group": ciphey.iface.ParamSpec(
|
||||
desc="An ordered sequence of chars that make up the caesar cipher alphabet",
|
||||
req=False,
|
||||
default="abcdefghijklmnopqrstuvwxyz",
|
||||
),
|
||||
"lower": ciphey.iface.ParamSpec(
|
||||
desc="Whether or not the ciphertext should be converted to lowercase first",
|
||||
req=False,
|
||||
default=True,
|
||||
),
|
||||
"keysize": ciphey.iface.ParamSpec(
|
||||
desc="A key size that should be used. If not given, will attempt to work it out",
|
||||
req=False,
|
||||
),
|
||||
"p_value": ciphey.iface.ParamSpec(
|
||||
desc="The p-value to use for windowed frequency analysis",
|
||||
req=False,
|
||||
default=0.99,
|
||||
),
|
||||
}
|
||||
|
||||
def __init__(self, config: ciphey.iface.Config):
|
||||
super().__init__(config)
|
||||
self.lower: Union[str, bool] = self._params()["lower"]
|
||||
if type(self.lower) != bool:
|
||||
self.lower = util.strtobool(self.lower)
|
||||
self.group = list(self._params()["group"])
|
||||
self.expected = config.get_resource(self._params()["expected"])
|
||||
self.cache = config.cache
|
||||
self.keysize = self._params().get("keysize")
|
||||
if self.keysize is not None:
|
||||
self.keysize = int(self.keysize)
|
||||
self.MAX_KEY_LENGTH = 16 # Will not attempt keys longer than this.
|
||||
self.p_value = self._params()["p_value"]
|
||||
|
||||
def kasiskiExamination(self, ciphertext) -> List[int]:
|
||||
# Find out the sequences of 3 to 5 letters that occur multiple times
|
||||
# in the ciphertext. repeatedSeqSpacings has a value like:
|
||||
# {'EXG': [192], 'NAF': [339, 972, 633], ... }
|
||||
repeatedSeqSpacings = self.findRepeatSequencesSpacings(ciphertext)
|
||||
|
||||
max = len(ciphertext) // 3
|
||||
|
||||
# (See getMostCommonFactors() for a description of seqFactors.)
|
||||
seqFactors = {}
|
||||
for seq in repeatedSeqSpacings:
|
||||
seqFactors[seq] = []
|
||||
for spacing in repeatedSeqSpacings[seq]:
|
||||
seqFactors[seq].extend(self.getUsefulFactors(spacing, max))
|
||||
|
||||
# (See getMostCommonFactors() for a description of factorsByCount.)
|
||||
factorsByCount = self.getMostCommonFactors(seqFactors)
|
||||
|
||||
# Now we extract the factor counts from factorsByCount and
|
||||
# put them in allLikelyKeyLengths so that they are easier to
|
||||
# use later:
|
||||
allLikelyKeyLengths = []
|
||||
for twoIntTuple in factorsByCount:
|
||||
allLikelyKeyLengths.append(twoIntTuple[0])
|
||||
|
||||
return allLikelyKeyLengths
|
||||
|
||||
def findRepeatSequencesSpacings(self, message):
|
||||
# Goes through the message and finds any 3 to 5 letter sequences
|
||||
# that are repeated. Returns a dict with the keys of the sequence and
|
||||
# values of a list of spacings (num of letters between the repeats).
|
||||
# Use a regular expression to remove non-letters from the message:
|
||||
|
||||
# Compile a list of seqLen-letter sequences found in the message:
|
||||
seqSpacings = {} # Keys are sequences, values are lists of int spacings.
|
||||
for seqLen in range(3, 6):
|
||||
for seqStart in range(len(message) - seqLen):
|
||||
# Determine what the sequence is, and store it in seq:
|
||||
seq = message[seqStart : seqStart + seqLen]
|
||||
|
||||
# Look for this sequence in the rest of the message:
|
||||
for i in range(seqStart + seqLen, len(message) - seqLen):
|
||||
if message[i : i + seqLen] == seq:
|
||||
# Found a repeated sequence.
|
||||
if seq not in seqSpacings:
|
||||
seqSpacings[seq] = [] # Initialize a blank list.
|
||||
|
||||
# Append the spacing distance between the repeated
|
||||
# sequence and the original sequence:
|
||||
seqSpacings[seq].append(i - seqStart)
|
||||
return seqSpacings
|
||||
|
||||
def getUsefulFactors(self, num, max: int):
|
||||
# Returns a list of useful factors of num. By "useful" we mean factors
|
||||
# less than MAX_KEY_LENGTH + 1 and not 1. For example,
|
||||
# getUsefulFactors(144) returns [2, 3, 4, 6, 8, 9, 12, 16]
|
||||
|
||||
if num < 2:
|
||||
return [] # Numbers less than 2 have no useful factors.
|
||||
|
||||
factors = set() # The list of factors found.
|
||||
|
||||
# When finding factors, you only need to check the integers up to
|
||||
# MAX_KEY_LENGTH.
|
||||
#
|
||||
# Mathematician note: whilst this is *definitely* suboptimal,
|
||||
# for small numbers it's probably as good as other methods
|
||||
for i in range(
|
||||
2, min(max, num)
|
||||
): # Don't test 1: it's not useful.
|
||||
if num % i == 0:
|
||||
factors.add(i)
|
||||
otherFactor = num // i
|
||||
if otherFactor < self.MAX_KEY_LENGTH + 1 and otherFactor != 1:
|
||||
factors.add(otherFactor)
|
||||
return list(factors)
|
||||
|
||||
#
|
||||
def getMostCommonFactors(self, seqFactors):
|
||||
# First, get a count of how many times a factor occurs in seqFactors:
|
||||
factorCounts = {} # Key is a factor, value is how often it occurs.
|
||||
|
||||
# seqFactors keys are sequences, values are lists of factors of the
|
||||
# spacings. seqFactors has a value like: {'GFD': [2, 3, 4, 6, 9, 12,
|
||||
# 18, 23, 36, 46, 69, 92, 138, 207], 'ALW': [2, 3, 4, 6, ...], ...}
|
||||
for seq in seqFactors:
|
||||
factorList = seqFactors[seq]
|
||||
for factor in factorList:
|
||||
if factor not in factorCounts:
|
||||
factorCounts[factor] = 0
|
||||
factorCounts[factor] += 1
|
||||
|
||||
# Second, put the factor and its count into a tuple, and make a list
|
||||
# of these tuples so we can sort them:
|
||||
factorsByCount = []
|
||||
for factor in factorCounts:
|
||||
# Exclude factors larger than MAX_KEY_LENGTH:
|
||||
if factor <= self.MAX_KEY_LENGTH:
|
||||
# factorsByCount is a list of tuples: (factor, factorCount)
|
||||
# factorsByCount has a value like: [(3, 497), (2, 487), ...]
|
||||
factorsByCount.append((factor, factorCounts[factor]))
|
||||
|
||||
# Sort the list by the factor count:
|
||||
factorsByCount.sort(key=lambda x: x[1], reverse=True)
|
||||
|
||||
return factorsByCount
|
|
@ -0,0 +1 @@
|
|||
from . import morse, bases, unicode, reverse
|
|
@ -0,0 +1,45 @@
|
|||
import base64
|
||||
import types
|
||||
|
||||
import ciphey
|
||||
import binascii
|
||||
from typing import Callable, Optional, Any, Dict
|
||||
|
||||
from loguru import logger
|
||||
|
||||
|
||||
def _dispatch(self: Any, ctext: str, func: Callable[[str], bytes]) -> Optional[bytes]:
|
||||
logger.trace(f"Attempting {self.getTarget()}")
|
||||
|
||||
try:
|
||||
result = func(ctext)
|
||||
logger.debug(f"{self.getTarget()} successful, returning {result}")
|
||||
return result
|
||||
except ValueError:
|
||||
logger.trace(f"Failed to decode {self.getTarget()}")
|
||||
return None
|
||||
|
||||
|
||||
_bases = {
|
||||
"base16": (base64.b16decode, 0.4),
|
||||
"base32": (base64.b32decode, 0.01),
|
||||
"base64": (base64.b64decode, 0.4),
|
||||
"base85": (base64.b85decode, 0.01),
|
||||
"ascii85": (base64.a85decode, 0.1),
|
||||
}
|
||||
|
||||
|
||||
def gen_class(name, decoder, priority, ns):
|
||||
ns["_get_func"] = ciphey.common.id_lambda(decoder)
|
||||
ns["decode"] = lambda self, ctext: _dispatch(self, ctext, self._get_func())
|
||||
ns["getParams"] = ciphey.common.id_lambda(None)
|
||||
ns["getTarget"] = ciphey.common.id_lambda(name)
|
||||
ns["priority"] = ciphey.common.id_lambda(priority)
|
||||
ns["__init__"] = lambda self, config: super(type(self), self).__init__(config)
|
||||
|
||||
|
||||
for name, (decoder, priority) in _bases.items():
|
||||
t = types.new_class(name, (ciphey.iface.Decoder[str, bytes],),
|
||||
exec_body=lambda x: gen_class(name, decoder, priority, x))
|
||||
|
||||
ciphey.iface.registry.register(t)
|
|
@ -0,0 +1,101 @@
|
|||
from typing import Optional, Dict, Any, List
|
||||
import re
|
||||
from loguru import logger
|
||||
import ciphey
|
||||
from ciphey.iface import registry
|
||||
|
||||
|
||||
@registry.register
|
||||
class MorseCode(ciphey.iface.Decoder[str, str]):
|
||||
# A priority list for char/word boundaries
|
||||
BOUNDARIES = {" ": 1, "/": 2, "\n": 3, ".": -1, "-": -1}
|
||||
MAX_PRIORITY = 3
|
||||
ALLOWED = {".", "-", " ", "/", "\n"}
|
||||
MORSE_CODE_DICT: Dict[str, str]
|
||||
MORSE_CODE_DICT_INV: Dict[str, str]
|
||||
|
||||
@staticmethod
|
||||
def getTarget() -> str:
|
||||
return "morse"
|
||||
|
||||
def decode(self, text: str) -> Optional[str]:
|
||||
# Trim end
|
||||
while text[-1] in self.BOUNDARIES:
|
||||
text = text[:-1]
|
||||
|
||||
logger.trace("Attempting morse code")
|
||||
|
||||
char_boundary = word_boundary = None
|
||||
|
||||
char_boundary = word_boundary = None
|
||||
char_priority = word_priority = 0
|
||||
# Custom loop allows early break
|
||||
for i in text:
|
||||
i_priority = self.BOUNDARIES.get(i)
|
||||
if i_priority is None:
|
||||
logger.trace(f"Non-morse char '{i}' found")
|
||||
return None
|
||||
|
||||
if i_priority <= char_priority or i == char_boundary or i == word_boundary:
|
||||
continue
|
||||
# Default to having a char boundary over a word boundary
|
||||
if (
|
||||
i_priority > word_priority
|
||||
and word_boundary is None
|
||||
and char_boundary is not None
|
||||
):
|
||||
word_priority = i_priority
|
||||
word_boundary = i
|
||||
continue
|
||||
char_priority = i_priority
|
||||
char_boundary = i
|
||||
|
||||
logger.trace(
|
||||
f"'Char boundary is '{char_boundary}', and word boundary is '{word_boundary}'"
|
||||
)
|
||||
|
||||
result = ""
|
||||
|
||||
for word in text.split(word_boundary) if word_boundary else [text]:
|
||||
logger.trace(f"Attempting to decode word {word}")
|
||||
for char in word.split(char_boundary):
|
||||
try:
|
||||
m = self.MORSE_CODE_DICT_INV[char]
|
||||
except KeyError:
|
||||
logger.trace(f"Invalid codeword '{word}' found")
|
||||
return None
|
||||
result = result + m
|
||||
# after every word add a space
|
||||
result = result + " "
|
||||
if len(result) == 0:
|
||||
logger.trace(f"Morse code failed to match")
|
||||
return None
|
||||
# Remove trailing space
|
||||
result = result[:-1]
|
||||
logger.debug(f"Morse code successful, returning {result}")
|
||||
return result.strip().upper()
|
||||
|
||||
@staticmethod
|
||||
def getParams() -> Optional[Dict[str, ciphey.iface.ParamSpec]]:
|
||||
return {
|
||||
"dict": ciphey.iface.ParamSpec(
|
||||
desc="The morse code dictionary to use",
|
||||
req=False,
|
||||
default="cipheydists::translate::morse",
|
||||
)
|
||||
}
|
||||
|
||||
@staticmethod
|
||||
def getName() -> str:
|
||||
return "morse"
|
||||
|
||||
@staticmethod
|
||||
def priority() -> float:
|
||||
return 0.05
|
||||
|
||||
def __init__(self, config: ciphey.iface.Config):
|
||||
super().__init__(config)
|
||||
self.MORSE_CODE_DICT = config.get_resource(
|
||||
self._params()["dict"], ciphey.iface.WordList
|
||||
)
|
||||
self.MORSE_CODE_DICT_INV = {v: k for k, v in self.MORSE_CODE_DICT.items()}
|
|
@ -0,0 +1,24 @@
|
|||
from typing import Optional, Dict, List
|
||||
|
||||
from ciphey.iface import ParamSpec, Config, T, U, Decoder, registry
|
||||
|
||||
|
||||
@registry.register_multi((str, str), (bytes, bytes))
|
||||
class Reverse(Decoder):
|
||||
def decode(self, ctext: T) -> Optional[U]:
|
||||
return ctext[::-1]
|
||||
|
||||
@staticmethod
|
||||
def priority() -> float:
|
||||
return 0.05
|
||||
|
||||
def __init__(self, config: Config):
|
||||
super().__init__(config)
|
||||
|
||||
@staticmethod
|
||||
def getParams() -> Optional[Dict[str, ParamSpec]]:
|
||||
pass
|
||||
|
||||
@staticmethod
|
||||
def getTarget() -> str:
|
||||
return "reverse"
|
|
@ -0,0 +1,38 @@
|
|||
from typing import Optional, Dict, Any
|
||||
|
||||
from loguru import logger
|
||||
|
||||
import ciphey
|
||||
from ciphey.iface import registry
|
||||
|
||||
|
||||
@registry.register
|
||||
class Utf8(ciphey.iface.Decoder[bytes, str]):
|
||||
@staticmethod
|
||||
def getTarget() -> str:
|
||||
return "utf8"
|
||||
|
||||
def decode(self, text: bytes) -> Optional[str]:
|
||||
logger.trace("Attempting utf-8 decode")
|
||||
try:
|
||||
res = text.decode("utf8")
|
||||
logger.debug(f"utf-8 decode gave '{res}'")
|
||||
return res if len(res) != 0 else None
|
||||
except UnicodeDecodeError:
|
||||
logger.trace("utf-8 decode failed")
|
||||
return None
|
||||
|
||||
@staticmethod
|
||||
def getParams() -> Optional[Dict[str, Dict[str, Any]]]:
|
||||
pass
|
||||
|
||||
@staticmethod
|
||||
def getName() -> str:
|
||||
return "UTF-8"
|
||||
|
||||
@staticmethod
|
||||
def priority() -> float:
|
||||
return 0.9
|
||||
|
||||
def __init__(self, config: ciphey.iface.Config):
|
||||
super().__init__(config)
|
|
@ -0,0 +1 @@
|
|||
from . import cipheydists, files
|
|
@ -0,0 +1,39 @@
|
|||
from typing import Optional, Dict, Any, Set
|
||||
|
||||
from functools import lru_cache
|
||||
|
||||
import loguru
|
||||
|
||||
import ciphey
|
||||
import cipheydists
|
||||
from ciphey.iface import ParamSpec, Config, registry, WordList, Distribution
|
||||
|
||||
|
||||
@registry.register_multi(WordList, Distribution)
|
||||
class CipheyDists(ciphey.iface.ResourceLoader):
|
||||
# _wordlists: Set[str] = frozenset({"english", "english1000", "englishStopWords"})
|
||||
# _brandons: Set[str] = frozenset({"english"})
|
||||
# _dists: Set[str] = frozenset({"twist"})
|
||||
# _translates: Set[str] = frozenset({"morse"})
|
||||
_getters = {
|
||||
"list": cipheydists.get_list,
|
||||
"dist": cipheydists.get_dist,
|
||||
"brandon": cipheydists.get_brandon,
|
||||
"translate": cipheydists.get_translate,
|
||||
}
|
||||
|
||||
def whatResources(self) -> Optional[Set[str]]:
|
||||
pass
|
||||
|
||||
@lru_cache
|
||||
def getResource(self, name: str) -> Any:
|
||||
loguru.logger.trace(f"Loading cipheydists resource {name}")
|
||||
prefix, name = name.split("::", 1)
|
||||
return self._getters[prefix](name)
|
||||
|
||||
def __init__(self, config: Config):
|
||||
super().__init__(config)
|
||||
|
||||
@staticmethod
|
||||
def getParams() -> Optional[Dict[str, ParamSpec]]:
|
||||
pass
|
|
@ -0,0 +1,65 @@
|
|||
from abc import abstractmethod
|
||||
from typing import Optional, Dict, Any, Set, Generic, Type
|
||||
|
||||
from functools import lru_cache
|
||||
|
||||
import ciphey
|
||||
from ciphey.iface import T, ParamSpec, Config, get_args, registry
|
||||
|
||||
import json
|
||||
import csv
|
||||
|
||||
|
||||
# We can use a generic resource loader here, as we can instantiate it later
|
||||
@registry.register_multi(ciphey.iface.WordList, ciphey.iface.Distribution)
|
||||
class Json(ciphey.iface.ResourceLoader):
|
||||
def whatResources(self) -> T:
|
||||
return self._names
|
||||
|
||||
@lru_cache
|
||||
def getResource(self, name: str) -> T:
|
||||
prefix, name = name.split("::", 1)
|
||||
return {"wordlist": (lambda js: {js}), "dist": (lambda js: js)}[prefix](
|
||||
json.load(open(self._paths[int(name) - 1]))
|
||||
)
|
||||
|
||||
@staticmethod
|
||||
def getName() -> str:
|
||||
return "json"
|
||||
|
||||
@staticmethod
|
||||
def getParams() -> Optional[Dict[str, ciphey.iface.ParamSpec]]:
|
||||
return {"path": ParamSpec(req=True, desc="The path to a JSON file", list=True)}
|
||||
|
||||
def __init__(self, config: ciphey.iface.Config):
|
||||
super().__init__(config)
|
||||
self._paths = self._params()["path"]
|
||||
self._names = set(range(1, len(self._paths)))
|
||||
|
||||
|
||||
# We can use a generic resource loader here, as we can instantiate it later
|
||||
@registry.register_multi(ciphey.iface.WordList, ciphey.iface.Distribution)
|
||||
class Csv(Generic[T], ciphey.iface.ResourceLoader[T]):
|
||||
def whatResources(self) -> Set[str]:
|
||||
return self._names
|
||||
|
||||
@lru_cache
|
||||
def getResource(self, name: str) -> T:
|
||||
prefix, name = name.split("::", 1)
|
||||
return {
|
||||
"wordlist": (lambda reader: {i[0] for i in reader}),
|
||||
"dist": (lambda reader: {i[0]: float(i[1]) for i in reader}),
|
||||
}[prefix](csv.reader(open(self._paths[int(name) - 1])))
|
||||
|
||||
@staticmethod
|
||||
def getName() -> str:
|
||||
return "csv"
|
||||
|
||||
@staticmethod
|
||||
def getParams() -> Optional[Dict[str, ciphey.iface.ParamSpec]]:
|
||||
return {"path": ParamSpec(req=True, desc="The path to a CSV file", list=True)}
|
||||
|
||||
def __init__(self, config: ciphey.iface.Config):
|
||||
super().__init__(config)
|
||||
self._paths = self._params()["path"]
|
||||
self._names = set(range(1, len(self._paths)))
|
|
@ -0,0 +1 @@
|
|||
from . import ausearch, perfection
|
|
@ -0,0 +1,183 @@
|
|||
from collections import deque
|
||||
import cipheycore
|
||||
|
||||
|
||||
class Node:
|
||||
"""
|
||||
A node has a value assiocated with it
|
||||
Calculated from the heuristic
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self, config, h: float = None, edges: (any, float) = None, ctext: str = None,
|
||||
):
|
||||
self.weight = h
|
||||
# Edges is a list of other nodes it can connect to
|
||||
self.edges = edges
|
||||
self.ctext = ctext
|
||||
self.h = h
|
||||
self.path = []
|
||||
self.information_content = config.cache.get_or_update(
|
||||
self.text,
|
||||
"cipheycore::info_content",
|
||||
lambda: cipheycore.info_content(self.ctext),
|
||||
)
|
||||
|
||||
def __le__(self, node2):
|
||||
# if self is less than other
|
||||
return self.x <= node2.x
|
||||
|
||||
def __lt__(self, node2):
|
||||
return self.x < node2.x
|
||||
|
||||
def append_edge(self, edge):
|
||||
self.edges.append(edge)
|
||||
|
||||
def get_edges(self):
|
||||
return self.edges
|
||||
|
||||
|
||||
class Graph:
|
||||
# example of adjacency list (or rather map)
|
||||
# adjacency_list = {
|
||||
# 'A': [('B', 1), ('C', 3), ('D', 7)],
|
||||
# 'B': [('D', 5)],
|
||||
# 'C': [('D', 12)]
|
||||
# }
|
||||
|
||||
def __init__(self, adjacency_list):
|
||||
"""
|
||||
adjacency list: basically the graph
|
||||
"""
|
||||
self.adjacency_list = adjacency_list
|
||||
self.original_input = cipheycore.info_content(input)
|
||||
|
||||
def get_neighbors(self, v):
|
||||
try:
|
||||
return self.adjacency_list[v]
|
||||
except KeyError:
|
||||
# If we have exhausted the adjacency list
|
||||
return []
|
||||
|
||||
# heuristic function with equal values for all nodes
|
||||
def heuristic(self, n: Node):
|
||||
return n.info_content / self.original_input
|
||||
|
||||
def a_star_algorithm(self, start_node: Node, stop_node: Node):
|
||||
# TODO store the graph as an attribute
|
||||
# open_list is a list of nodes which have been visited, but who's neighbors
|
||||
# haven't all been inspected, starts off with the start node
|
||||
# closed_list is a list of nodes which have been visited
|
||||
# and who's neighbors have been inspected
|
||||
open_list = set([start_node])
|
||||
closed_list = set([])
|
||||
|
||||
# g contains current distances from start_node to all other nodes
|
||||
# the default value (if it's not found in the map) is +infinity
|
||||
g = {}
|
||||
|
||||
g[start_node] = 0
|
||||
|
||||
# parents contains an adjacency map of all nodes
|
||||
parents = {}
|
||||
parents[start_node] = start_node
|
||||
|
||||
while len(open_list) > 0:
|
||||
print(f"The open list is {open_list}")
|
||||
n = None
|
||||
|
||||
# find a node with the lowest value of f() - evaluation function
|
||||
for v in open_list:
|
||||
# TODO if v == decoder, run the decoder
|
||||
print(f"The for loop node v is {v}")
|
||||
if n == None or g[v] + self.h(v) < g[n] + self.h(n):
|
||||
n = v
|
||||
print(f"The value of n is {n}")
|
||||
|
||||
if n == None:
|
||||
print("Path does not exist!")
|
||||
return None
|
||||
|
||||
# if the current node is the stop_node
|
||||
# then we begin reconstructin the path from it to the start_node
|
||||
# NOTE Uncomment this for an exit condition
|
||||
# TODO Make it exit if decryptor returns True
|
||||
# TODO We need to append the decryption methods to each node
|
||||
# So when we reconstruct the path we can reconstruct the decryptions
|
||||
# used
|
||||
if n == stop_node:
|
||||
print("n is the stop node, we are stopping!")
|
||||
reconst_path = []
|
||||
|
||||
while parents[n] != n:
|
||||
reconst_path.append(n)
|
||||
n = parents[n]
|
||||
|
||||
reconst_path.append(start_node)
|
||||
|
||||
reconst_path.reverse()
|
||||
|
||||
print("Path found: {}".format(reconst_path))
|
||||
return reconst_path
|
||||
|
||||
print(n)
|
||||
for (m, weight) in self.get_neighbors(n):
|
||||
print(f"And the iteration is ({m}, {weight})")
|
||||
# if the current node isn't in both open_list and closed_list
|
||||
# add it to open_list and note n as it's parent
|
||||
if m not in open_list and m not in closed_list:
|
||||
open_list.add(m)
|
||||
parents[m] = n
|
||||
g[m] = g[n] + weight
|
||||
|
||||
# otherwise, check if it's quicker to first visit n, then m
|
||||
# and if it is, update parent data and g data
|
||||
# and if the node was in the closed_list, move it to open_list
|
||||
else:
|
||||
if g[m] > g[n] + weight:
|
||||
g[m] = g[n] + weight
|
||||
parents[m] = n
|
||||
|
||||
if m in closed_list:
|
||||
closed_list.remove(m)
|
||||
open_list.add(m)
|
||||
|
||||
# remove n from the open_list, and add it to closed_list
|
||||
# because all of his neighbors were inspected
|
||||
# open_list.remove(node)
|
||||
# closed_list.add(node)
|
||||
|
||||
open_list.remove(n)
|
||||
closed_list.add(n)
|
||||
print("\n")
|
||||
|
||||
print("Path does not exist!")
|
||||
return None
|
||||
|
||||
|
||||
adjacency_list = {
|
||||
"A": [("B", 1), ("C", 3), ("D", 7)],
|
||||
"B": [("D", 5)],
|
||||
"C": [("D", 12)],
|
||||
}
|
||||
A = Node(1)
|
||||
B = Node(7)
|
||||
C = Node(9)
|
||||
D = Node(16)
|
||||
|
||||
A.edges = [(B, 1), (C, 3), (D, 7)]
|
||||
B.edges = [(D, 5)]
|
||||
C.edges = [(D, 12)]
|
||||
|
||||
# TODO use a dictionary comprehension to make this
|
||||
adjacency_list = {
|
||||
A: A.edges,
|
||||
B: B.edges,
|
||||
C: C.edges,
|
||||
}
|
||||
graph1 = Graph(adjacency_list)
|
||||
graph1.a_star_algorithm(A, D)
|
||||
|
||||
"""
|
||||
Maybe after it
|
||||
"""
|
|
@ -0,0 +1,174 @@
|
|||
function reconstruct_path(cameFrom, current)
|
||||
total_path := {current}
|
||||
while current in cameFrom.Keys:
|
||||
current := cameFrom[current]
|
||||
total_path.prepend(current)
|
||||
return total_path
|
||||
|
||||
// A* finds a path from start to goal.
|
||||
// h is the heuristic function. h(n) estimates the cost to reach goal from node n.
|
||||
function A_Star(graph, start, h)
|
||||
// The set of discovered nodes that may need to be (re-)expanded.
|
||||
// Initially, only the start node is known.
|
||||
// This is usually implemented as a min-heap or priority queue rather than a hash-set.
|
||||
openSet := {start}
|
||||
|
||||
// For node n, cameFrom[n] is the node immediately preceding it on the cheapest path from start
|
||||
// to n currently known.
|
||||
cameFrom := an empty map
|
||||
|
||||
// For node n, gScore[n] is the cost of the cheapest path from start to n currently known.
|
||||
gScore := map with default value of Infinity
|
||||
gScore[start] := 0
|
||||
|
||||
// For node n, fScore[n] := gScore[n] + h(n). fScore[n] represents our current best guess as to
|
||||
// how short a path from start to finish can be if it goes through n.
|
||||
fScore := map with default value of Infinity
|
||||
fScore[start] := h(start)
|
||||
|
||||
// the exit condition is set to True when LC returns True
|
||||
exit_condition = False
|
||||
|
||||
while not exit_condition
|
||||
// This operation can occur in O(1) time if openSet is a min-heap or a priority queue
|
||||
current := the node in openSet having the lowest fScore[] value
|
||||
if current = goal
|
||||
return reconstruct_path(cameFrom, current)
|
||||
|
||||
openSet.Remove(current)
|
||||
for each neighbor of current
|
||||
decodings = neighbor.decoders()
|
||||
|
||||
|
||||
|
||||
// d(current,neighbor) is the weight of the edge from current to neighbor
|
||||
// tentative_gScore is the distance from start to the neighbor through current
|
||||
tentative_gScore := gScore[current] + d(current, neighbor)
|
||||
if tentative_gScore < gScore[neighbor]
|
||||
// This path to neighbor is better than any previous one. Record it!
|
||||
cameFrom[neighbor] := current
|
||||
gScore[neighbor] := tentative_gScore
|
||||
fScore[neighbor] := gScore[neighbor] + h(neighbor)
|
||||
if neighbor not in openSet
|
||||
openSet.add(neighbor)
|
||||
|
||||
# run the cracker on the object
|
||||
crack(node.ctext)
|
||||
if crack:
|
||||
# if cracker returns true, reconstruct path and exiti
|
||||
exit_condition = True
|
||||
reconstruct(start, node)
|
||||
else:
|
||||
# else add the new children of the cracker to openSet
|
||||
openSet.append(node: crack)
|
||||
|
||||
|
||||
|
||||
|
||||
// Open set is empty but goal was never reached
|
||||
return failure
|
||||
|
||||
|
||||
function reconstruct_path(cameFrom, current)
|
||||
total_path := {current}
|
||||
while current in cameFrom.Keys:
|
||||
current := cameFrom[current]
|
||||
total_path.prepend(current)
|
||||
return total_path
|
||||
|
||||
// A* finds a path from start to goal.
|
||||
// h is the heuristic function. h(n) estimates the cost to reach goal from node n.
|
||||
function A_Star(graph, start, h)
|
||||
// The set of discovered nodes that may need to be (re-)expanded.
|
||||
// Initially, only the start node is known.
|
||||
// This is usually implemented as a min-heap or priority queue rather than a hash-set.
|
||||
openSet := {start}
|
||||
|
||||
// For node n, cameFrom[n] is the node immediately preceding it on the cheapest path from start
|
||||
// to n currently known.
|
||||
cameFrom := an empty map
|
||||
|
||||
// For node n, gScore[n] is the cost of the cheapest path from start to n currently known.
|
||||
gScore := map with default value of Infinity
|
||||
gScore[start] := 0
|
||||
|
||||
// For node n, fScore[n] := gScore[n] + h(n). fScore[n] represents our current best guess as to
|
||||
// how short a path from start to finish can be if it goes through n.
|
||||
fScore := map with default value of Infinity
|
||||
fScore[start] := h(start)
|
||||
|
||||
// the exit condition is set to True when LC returns True
|
||||
exit_condition = False
|
||||
|
||||
while not exit_condition
|
||||
// This operation can occur in O(1) time if openSet is a min-heap or a priority queue
|
||||
current := the node in openSet having the lowest fScore[] value
|
||||
if current = goal
|
||||
return reconstruct_path(cameFrom, current)
|
||||
|
||||
openSet.Remove(current)
|
||||
for each neighbor of current
|
||||
decodings = neighbor.decoders()
|
||||
|
||||
|
||||
|
||||
// d(current,neighbor) is the weight of the edge from current to neighbor
|
||||
// tentative_gScore is the distance from start to the neighbor through current
|
||||
tentative_gScore := gScore[current] + d(current, neighbor)
|
||||
if tentative_gScore < gScore[neighbor]
|
||||
// This path to neighbor is better than any previous one. Record it!
|
||||
cameFrom[neighbor] := current
|
||||
gScore[neighbor] := tentative_gScore
|
||||
fScore[neighbor] := gScore[neighbor] + h(neighbor)
|
||||
if neighbor not in openSet
|
||||
openSet.add(neighbor)
|
||||
|
||||
# run the cracker on the object
|
||||
crack(node.ctext)
|
||||
if crack:
|
||||
# if cracker returns true, reconstruct path and exiti
|
||||
exit_condition = True
|
||||
reconstruct(start, node)
|
||||
else:
|
||||
# else add the new children of the cracker to openSet
|
||||
openSet.append(node: crack)
|
||||
|
||||
|
||||
|
||||
|
||||
// Open set is empty but goal was never reached
|
||||
|
||||
function calculate_new_children(node):
|
||||
|
||||
|
||||
class Node:
|
||||
"""
|
||||
A node has a value assiocated with it
|
||||
Calculated from the heuristic
|
||||
"""
|
||||
|
||||
def __init__(self, h: float = None, edges: (any, float) = None, ctext: str = None):
|
||||
self.weight = h
|
||||
# Edges is a list of other nodes it can connect to
|
||||
self.edges = edges
|
||||
self.ctext = ctext
|
||||
self.h = h
|
||||
self.path = []
|
||||
self.information_content = config.cache.get_or_update(
|
||||
self.ctext,
|
||||
"cipheycore::info_content",
|
||||
lambda: cipheycore.info_content(self.ctext),
|
||||
)
|
||||
|
||||
def __le__(self, node2):
|
||||
# if self is less than other
|
||||
return self.x <= node2.x
|
||||
|
||||
def __lt__(self, node2):
|
||||
return self.x < node2.x
|
||||
|
||||
def append_edge(self, edge):
|
||||
self.edges.append(edge)
|
||||
|
||||
def get_edges(self):
|
||||
return self.edges
|
|
@ -0,0 +1,183 @@
|
|||
from abc import abstractmethod, ABC
|
||||
from typing import Generic, List, Optional, Dict, Any, NamedTuple, Union, Set, Tuple
|
||||
from ciphey.iface import (
|
||||
T,
|
||||
Cracker,
|
||||
Config,
|
||||
Searcher,
|
||||
ParamSpec,
|
||||
CrackInfo,
|
||||
registry,
|
||||
SearchLevel,
|
||||
CrackResult,
|
||||
SearchResult,
|
||||
Decoder,
|
||||
DecoderComparer,
|
||||
)
|
||||
from datetime import datetime
|
||||
from loguru import logger
|
||||
|
||||
|
||||
class Node(Generic[T], NamedTuple):
|
||||
cracker: Cracker
|
||||
parents: List[SearchLevel]
|
||||
crack_info: CrackInfo
|
||||
check_info: float
|
||||
|
||||
def __hash__(self):
|
||||
return hash((type(self.cracker).__name__, len(self.parents)))
|
||||
|
||||
|
||||
class AuSearch(Searcher, ABC):
|
||||
@abstractmethod
|
||||
def findBestNode(self, nodes: Set[Node]) -> Node:
|
||||
pass
|
||||
|
||||
def handleDecodings(
|
||||
self, target: Any
|
||||
) -> (bool, Union[Tuple[SearchLevel, str], List[SearchLevel]]):
|
||||
"""
|
||||
If there exists a decoding that the checker returns true on, returns (True, result).
|
||||
Otherwise, returns (False, names and successful decodings)
|
||||
|
||||
The CrackResult object should only have the value field filled in
|
||||
|
||||
MUST NOT recurse into decodings! evaluate does that for you!
|
||||
"""
|
||||
# This tag is necessary, as we could have a list as a decoding target, which would then screw over type checks
|
||||
ret = []
|
||||
|
||||
decoders = []
|
||||
|
||||
for decoder_type, decoder_class in registry[Decoder][type(target)].items():
|
||||
for decoder in decoder_class:
|
||||
decoders.append(DecoderComparer(decoder))
|
||||
# Fun fact:
|
||||
# with Python's glorious lists, inserting n elements into the right position (with bisect) is O(n^2)
|
||||
decoders.sort(reverse=True)
|
||||
|
||||
for decoder_cmp in decoders:
|
||||
logger.trace(f"Inspecting {decoder_cmp}")
|
||||
res = self._config()(decoder_cmp.value).decode(target)
|
||||
if res is None:
|
||||
continue
|
||||
level = SearchLevel(
|
||||
name=decoder_cmp.value.__name__.lower(),
|
||||
result=CrackResult(value=res),
|
||||
)
|
||||
if type(res) == self._final_type:
|
||||
check_res = self._checker(res)
|
||||
if check_res is not None:
|
||||
return True, (level, check_res)
|
||||
ret.append(level)
|
||||
return False, ret
|
||||
|
||||
def expand(
|
||||
self, parents: List[SearchLevel], check: bool = True
|
||||
) -> (bool, Union[SearchResult, List[Node]]):
|
||||
result = parents[-1].result.value
|
||||
# logger.debug(f"Expanding {parents}")
|
||||
|
||||
# Deduplication
|
||||
if not self._config().cache.mark_ctext(result):
|
||||
return False, []
|
||||
|
||||
if check and type(result) == self._final_type:
|
||||
check_res = self._checker(result)
|
||||
if check_res is not None:
|
||||
return True, SearchResult(path=parents, check_res=check_res)
|
||||
|
||||
success, dec_res = self.handleDecodings(result)
|
||||
if success:
|
||||
return True, SearchResult(path=parents + [dec_res[0]], check_res=dec_res[1])
|
||||
|
||||
nodes: List[Node] = []
|
||||
|
||||
for decoding in dec_res:
|
||||
# Don't check, as handleDecodings did that for us
|
||||
success, eval_res = self.expand(parents + [decoding], check=False)
|
||||
if success:
|
||||
return True, eval_res
|
||||
nodes.extend(eval_res)
|
||||
|
||||
crackers: List[Cracker] = registry[Cracker[type(result)]]
|
||||
expected_time: float
|
||||
|
||||
# Worth doing this check twice to simplify code and allow a early return for decodings
|
||||
if type(result) == self._final_type:
|
||||
expected_time = self._checker.getExpectedRuntime(result)
|
||||
else:
|
||||
expected_time = 0
|
||||
for i in crackers:
|
||||
cracker = self._config()(i)
|
||||
nodes.append(
|
||||
Node(
|
||||
cracker=cracker,
|
||||
crack_info=cracker.getInfo(result),
|
||||
check_info=expected_time,
|
||||
parents=parents,
|
||||
)
|
||||
)
|
||||
|
||||
return False, nodes
|
||||
|
||||
def evaluate(self, node: Node) -> (bool, Union[List[SearchLevel], List[Node]]):
|
||||
# logger.debug(f"Evaluating {node}")
|
||||
|
||||
res = node.cracker.attemptCrack(node.parents[-1].result.value)
|
||||
# Detect if we succeeded, and if deduplication is needed
|
||||
logger.trace(f"Got {len(res)} results")
|
||||
|
||||
ret = []
|
||||
for i in res:
|
||||
success, res = self.expand(
|
||||
node.parents
|
||||
+ [SearchLevel(name=type(node.cracker).__name__.lower(), result=i)]
|
||||
)
|
||||
if success:
|
||||
return True, res
|
||||
ret.extend(res)
|
||||
|
||||
return False, ret
|
||||
|
||||
def search(self, ctext: Any) -> List[SearchLevel]:
|
||||
deadline = (
|
||||
datetime.now() + self._config().objs["timeout"]
|
||||
if self._config().timeout is not None
|
||||
else datetime.max
|
||||
)
|
||||
|
||||
success, expand_res = self.expand(
|
||||
[SearchLevel(name="input", result=CrackResult(value=ctext))]
|
||||
)
|
||||
if success:
|
||||
return expand_res
|
||||
|
||||
nodes = set(expand_res)
|
||||
|
||||
while datetime.now() < deadline:
|
||||
# logger.trace(f"Have node tree {nodes}")
|
||||
|
||||
if len(nodes) == 0:
|
||||
raise LookupError("Could not find any solutions")
|
||||
|
||||
best_node = self.findBestNode(nodes)
|
||||
nodes.remove(best_node)
|
||||
success, eval_res = self.evaluate(best_node)
|
||||
if success:
|
||||
# logger.trace(f"Success with node {best_node}")
|
||||
return eval_res
|
||||
nodes.update(eval_res)
|
||||
|
||||
raise TimeoutError("Search ran out of time")
|
||||
|
||||
@staticmethod
|
||||
@abstractmethod
|
||||
def getParams() -> Optional[Dict[str, ParamSpec]]:
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def __init__(self, config: Config):
|
||||
super().__init__(config)
|
||||
self._checker = config.objs["checker"]
|
||||
self._final_type = config.objs["format"]["out"]
|
|
@ -0,0 +1,233 @@
|
|||
import heapq
|
||||
|
||||
|
||||
class Imperfection:
|
||||
"""The graph is a Node: [List of nodes]
|
||||
Where each item in the list of nodes can also have a node with a list of nodes
|
||||
|
||||
Ths result is that we can keep track of edges, while also keeping it small
|
||||
|
||||
To calculate current, we push the entire graph to A*
|
||||
|
||||
And it calculates the next node to choose, as well as increasing the size
|
||||
of the graph with values
|
||||
|
||||
We're using a heap, meaing the element at [0] is always the smallest element
|
||||
|
||||
So we choose that and return it.
|
||||
|
||||
|
||||
The current A* implemnentation has an end, we simply do not let it end as LC will make it
|
||||
end far before itreaches Searcher again.
|
||||
|
||||
Current is the start position, so if we say we always start at the start of the graph it'll
|
||||
go through the entire graph
|
||||
|
||||
graph = {
|
||||
Node: [
|
||||
{Node :
|
||||
{
|
||||
node
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
For encodings we just do them straight out
|
||||
|
||||
The last value of parents from abstract
|
||||
"""
|
||||
|
||||
"""
|
||||
|
||||
graph = {'A': ['B', 'C'],
|
||||
'B': ['C', 'D'],
|
||||
'C': ['D'],
|
||||
'D': ['C'],
|
||||
'E': ['F'],
|
||||
'F': ['C']}"""
|
||||
|
||||
def __init__(self):
|
||||
None
|
||||
|
||||
def findBestNode(nodes):
|
||||
"""Finds the best decryption module"""
|
||||
return next(iter(nodes))
|
||||
|
||||
# def aStar(self, graph, current, end):
|
||||
# """The A* search algorithm
|
||||
|
||||
# We're using heaps to find the minimum element (the one that will be the next current)
|
||||
# Heaps are like sets with O(1) lookup time, but maintain the lowest element as [0]
|
||||
# Sets insert in O(1), heaps in O(log N).
|
||||
|
||||
# https://stackoverflow.com/questions/4159331/python-speed-up-an-a-star-pathfinding-algorithm
|
||||
|
||||
# Current appears to be the list of all new tiles we can reach from current location
|
||||
|
||||
# End is the end node, that won't actually run bc LC will make it return before it hits aSTar function
|
||||
# so tbh I'll just make it infinitite unless something else forces a return
|
||||
|
||||
# The graph is the actual data structure used. According to StackOvervlow, it looks like this:
|
||||
|
||||
# graph = {'A': ['B', 'C'],
|
||||
# 'B': ['C', 'D'],
|
||||
# 'C': ['D'],
|
||||
# 'D': ['C'],
|
||||
# 'E': ['F'],
|
||||
# 'F': ['C']}
|
||||
|
||||
# """
|
||||
|
||||
# # Runs decodings first
|
||||
|
||||
# openSet = set()
|
||||
# openHeap = []
|
||||
# closedSet = set()
|
||||
|
||||
# def retracePath(c):
|
||||
# # Retraces a path back to the start
|
||||
# path = [c]
|
||||
# while c.parent is not None:
|
||||
# c = c.parent
|
||||
# path.append(c)
|
||||
# path.reverse()
|
||||
# return path
|
||||
|
||||
# # Adds the current location (start) to the heap and set
|
||||
# openSet.add(current)
|
||||
# openHeap.append((0, current))
|
||||
|
||||
# # while openSet contains items
|
||||
# while openSet:
|
||||
# # TODO change openSet to a heap?
|
||||
# # gets the 2nd element from the first element of the heap
|
||||
# # so the heap is (0, current)
|
||||
# # which means we pop current
|
||||
# # this makes me think that current isn't the first?
|
||||
# current = heapq.heappop(openHeap)[1]
|
||||
# # We don't actually want to end, so I'm commenting this:
|
||||
# # XXX
|
||||
# if current == end:
|
||||
# return retracePath(current)
|
||||
# # Removes it from todo and into done i think
|
||||
# # closedSet appears to be the set of things we have done
|
||||
# openSet.remove(current)
|
||||
# closedSet.add(current)
|
||||
|
||||
# """
|
||||
# Okay so our graph looks like this:
|
||||
# graph = {
|
||||
# Node: [
|
||||
# {Node :
|
||||
# {
|
||||
# node
|
||||
# }
|
||||
# }
|
||||
# ]
|
||||
# }
|
||||
# graph[current] **SHOULD** be the list of nodes which contains dictionaries of nodes
|
||||
|
||||
# """
|
||||
# for tile in graph[current]:
|
||||
# # ClosedSet appears to be the list of visited nodes
|
||||
# # TODO place this as a class attribute
|
||||
# if tile not in closedSet:
|
||||
# # This is the heuristic
|
||||
# # TODO expected_time/probability + k * heuristic, for some experimentally determined value of k
|
||||
# tile.H = (abs(end.x - tile.x) + abs(end.y - tile.y)) * 10
|
||||
|
||||
# # if tile is not in the openSet, add it and then pop it from the heap
|
||||
# if tile not in openSet:
|
||||
# openSet.add(tile)
|
||||
# heapq.heappush(openHeap, (tile.H, tile))
|
||||
# # I have no idea where this code is called lol
|
||||
# tile.parent = current
|
||||
|
||||
# # This returns Nothing
|
||||
# # I need to modify it so it finds the best item from Current
|
||||
# # So basically, return item 0 of openHeap
|
||||
# # return openHeap[0]
|
||||
# # Since the [0] item is always minimum
|
||||
# return []
|
||||
def aStar(self, graph, current, end):
|
||||
print(f"The graph is {graph}\nCurrent is {current}\n and End is {end}")
|
||||
openSet = set()
|
||||
openHeap = []
|
||||
closedSet = set()
|
||||
|
||||
def retracePath(c):
|
||||
print("Calling retrace path")
|
||||
path = [c]
|
||||
while c.parent is not None:
|
||||
c = c.parent
|
||||
path.append(c)
|
||||
path.reverse()
|
||||
return path
|
||||
|
||||
print("\n")
|
||||
|
||||
openSet.add(current)
|
||||
openHeap.append((0, current))
|
||||
while openSet:
|
||||
print(f"Openset is {openSet}")
|
||||
print(f"OpenHeap is {openHeap}")
|
||||
print(f"ClosedSet is {closedSet}")
|
||||
print(f"Current is {current}")
|
||||
print(f"I am popping {openHeap} with the first element")
|
||||
current = heapq.heappop(openHeap)[1]
|
||||
print(f"Current is now {current}")
|
||||
print(f"Graph current is {graph[current]}")
|
||||
if current == end:
|
||||
return retracePath(current)
|
||||
openSet.remove(current)
|
||||
closedSet.add(current)
|
||||
for tile in graph[current]:
|
||||
if tile not in closedSet:
|
||||
tile.H = (abs(end.x - tile.x) + abs(end.y - tile.y)) * 10
|
||||
tile.H = 1
|
||||
if tile not in openSet:
|
||||
openSet.add(tile)
|
||||
heapq.heappush(openHeap, (tile.H, tile))
|
||||
tile.parent = current
|
||||
print("\n")
|
||||
return []
|
||||
|
||||
|
||||
class Node:
|
||||
"""
|
||||
A node has a value assiocated with it
|
||||
Calculated from the heuristic
|
||||
"""
|
||||
|
||||
def __init__(self, h):
|
||||
self.h = h
|
||||
self.x = self.h
|
||||
self.y = 0.6
|
||||
|
||||
def __le__(self, node2):
|
||||
# if self is less than other
|
||||
return self.x <= node2.x
|
||||
|
||||
def __lt__(self, node2):
|
||||
return self.x < node2.x
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
obj = Imperfection()
|
||||
graph = {
|
||||
"A": ["B", "C"],
|
||||
"B": ["C", "D"],
|
||||
"C": ["D"],
|
||||
"D": ["C"],
|
||||
"E": ["F"],
|
||||
"F": ["C"],
|
||||
}
|
||||
# Makes the graph
|
||||
y = Node(0.5)
|
||||
x = Node(0.3)
|
||||
p = Node(0.7)
|
||||
q = Node(0.9)
|
||||
graph = {y: [x, p], p: q}
|
||||
|
||||
print(obj.aStar(graph, y, q))
|
|
@ -0,0 +1,31 @@
|
|||
from abc import abstractmethod
|
||||
from typing import Set, Any, Union, List, Optional, Dict, Tuple
|
||||
|
||||
from loguru import logger
|
||||
|
||||
from .ausearch import Node, AuSearch
|
||||
from ciphey.iface import (
|
||||
SearchLevel,
|
||||
Config,
|
||||
registry,
|
||||
CrackResult,
|
||||
Searcher,
|
||||
ParamSpec,
|
||||
Decoder,
|
||||
DecoderComparer,
|
||||
)
|
||||
|
||||
import bisect
|
||||
|
||||
|
||||
@registry.register
|
||||
class Perfection(AuSearch):
|
||||
@staticmethod
|
||||
def getParams() -> Optional[Dict[str, ParamSpec]]:
|
||||
pass
|
||||
|
||||
def findBestNode(self, nodes: Set[Node]) -> Node:
|
||||
return next(iter(nodes))
|
||||
|
||||
def __init__(self, config: Config):
|
||||
super().__init__(config)
|
|
@ -0,0 +1 @@
|
|||
from . import Checkers, Crackers, Decoders, Resources, Searchers
|
|
@ -0,0 +1,20 @@
|
|||
"""Some useful adapters"""
|
||||
from typing import Any
|
||||
|
||||
import cipheycore
|
||||
|
||||
|
||||
def id_lambda(value: Any):
|
||||
"""
|
||||
A function used in dynamic class generation that abstracts away a constant return value (like in getName)
|
||||
"""
|
||||
return lambda *args: value
|
||||
|
||||
|
||||
def cached_freq_analysis(ctext, config):
|
||||
base = config.objs.setdefault("cached_freq_analysis", ctext)
|
||||
res = base.get("cached_freq_analysis")
|
||||
if res is not None:
|
||||
return res
|
||||
|
||||
base["cached_freq_analysis"] = cipheycore.analyse_string(ctext)
|
|
@ -0,0 +1,17 @@
|
|||
from ._config import Config
|
||||
|
||||
from ._modules import \
|
||||
Decoder, DecoderComparer, \
|
||||
Cracker, CrackResult, CrackInfo, \
|
||||
Checker, \
|
||||
Searcher, SearchResult, SearchLevel, \
|
||||
ResourceLoader, \
|
||||
ParamSpec, \
|
||||
WordList, Distribution, \
|
||||
T, U, \
|
||||
pretty_search_results
|
||||
|
||||
from . import _registry
|
||||
from ._registry import get_args, get_origin
|
||||
|
||||
from ._fwd import registry
|
|
@ -0,0 +1,203 @@
|
|||
import os
|
||||
from abc import ABC, abstractmethod
|
||||
from typing import (
|
||||
Any,
|
||||
Dict,
|
||||
Optional,
|
||||
List,
|
||||
Type,
|
||||
Union, Callable,
|
||||
)
|
||||
import pydoc
|
||||
|
||||
from loguru import logger
|
||||
|
||||
import datetime
|
||||
|
||||
import yaml
|
||||
import appdirs
|
||||
|
||||
from . import _fwd
|
||||
from ._modules import Checker, Searcher, ResourceLoader
|
||||
|
||||
|
||||
class Cache:
|
||||
"""Used to track state between levels of recursion to stop infinite loops, and to optimise repeating actions"""
|
||||
|
||||
_cache: Dict[Any, Dict[str, Any]] = {}
|
||||
|
||||
def mark_ctext(self, ctext: Any) -> bool:
|
||||
if (type(ctext) == str or type(ctext) == bytes) and len(ctext) < 4:
|
||||
logger.trace(f"Candidate {ctext} too short!")
|
||||
return False
|
||||
|
||||
if ctext in self._cache:
|
||||
logger.trace(f"Deduped {ctext}")
|
||||
return False
|
||||
|
||||
self._cache[ctext] = {}
|
||||
return True
|
||||
|
||||
def get_or_update(self, ctext: Any, keyname: str, get_value: Callable[[], Any]):
|
||||
# Should have been marked first
|
||||
target = self._cache[ctext]
|
||||
res = target.get(keyname)
|
||||
if res is not None:
|
||||
return res
|
||||
|
||||
val = get_value()
|
||||
target[keyname] = val
|
||||
return val
|
||||
|
||||
|
||||
def split_resource_name(full_name: str) -> (str, str):
|
||||
return full_name.split("::", 1)
|
||||
|
||||
|
||||
class Config:
|
||||
verbosity: int = 0
|
||||
searcher: str = "perfection"
|
||||
params: Dict[str, Dict[str, Union[str, List[str]]]] = {}
|
||||
format: Dict[str, str] = {"in": "str", "out": "str"}
|
||||
modules: List[str] = []
|
||||
checker: str = "brandon"
|
||||
default_dist: str = "cipheydists::dist::twist"
|
||||
timeout: Optional[int] = None
|
||||
|
||||
_inst: Dict[type, Any] = {}
|
||||
objs: Dict[str, Any] = {}
|
||||
cache: Cache = Cache()
|
||||
|
||||
@staticmethod
|
||||
def get_default_dir() -> str:
|
||||
return appdirs.user_config_dir("ciphey")
|
||||
|
||||
def merge_dict(self, config_file: Optional[Dict[str, Any]]):
|
||||
if config_file is None:
|
||||
return
|
||||
for a, b in config_file.items():
|
||||
self.update(a, b)
|
||||
|
||||
def load_file(self, path: str = os.path.join(get_default_dir.__func__(), "config.yml"), create=False):
|
||||
try:
|
||||
with open(path, "r+") as file:
|
||||
return self.merge_dict(yaml.safe_load(file))
|
||||
except FileNotFoundError:
|
||||
if create:
|
||||
open(path, "w+")
|
||||
|
||||
def instantiate(self, t: type) -> Any:
|
||||
"""
|
||||
Used to enable caching of a instantiated type after the configuration has settled
|
||||
"""
|
||||
# We cannot use set default as that would construct it again, and throw away the result
|
||||
res = self._inst.get(t)
|
||||
if res is not None:
|
||||
return res
|
||||
ret = t(self)
|
||||
self._inst[t] = ret
|
||||
return ret
|
||||
|
||||
def __call__(self, t: type) -> Any:
|
||||
return self.instantiate(t)
|
||||
|
||||
def update(self, attrname: str, value: Optional[Any]):
|
||||
if value is not None:
|
||||
setattr(self, attrname, value)
|
||||
|
||||
def update_param(self, owner: str, name: str, value: Optional[Any]):
|
||||
if value is None:
|
||||
return
|
||||
|
||||
target = self.params.setdefault(owner, {})
|
||||
|
||||
if _fwd.registry.get_named(owner).getParams()[name].list:
|
||||
target.setdefault(name, []).append(value)
|
||||
else:
|
||||
target[name] = value
|
||||
|
||||
def update_format(self, paramname: str, value: Optional[Any]):
|
||||
if value is not None:
|
||||
self.format[paramname] = value
|
||||
|
||||
def load_objs(self):
|
||||
# Basic type conversion
|
||||
if self.timeout is not None:
|
||||
self.objs["timeout"] = datetime.timedelta(seconds=int(self.timeout))
|
||||
self.objs["format"] = {
|
||||
key: pydoc.locate(value) for key, value in self.format.items()
|
||||
}
|
||||
|
||||
# Checkers do not depend on anything
|
||||
self.objs["checker"] = self(_fwd.registry.get_named(self.checker, Checker))
|
||||
# Searchers only depend on checkers
|
||||
self.objs["searcher"] = self(_fwd.registry.get_named(self.searcher, Searcher))
|
||||
|
||||
def update_log_level(self, verbosity: Optional[int]):
|
||||
if verbosity is None:
|
||||
return
|
||||
self.verbosity = verbosity
|
||||
quiet_list = [
|
||||
"ERROR",
|
||||
"CRITICAL",
|
||||
]
|
||||
loud_list = [
|
||||
"DEBUG",
|
||||
"TRACE"
|
||||
]
|
||||
verbosity_name: str
|
||||
if verbosity == 0:
|
||||
verbosity_name = "WARNING"
|
||||
elif verbosity >= 0:
|
||||
verbosity_name = loud_list[min(len(loud_list), verbosity) - 1]
|
||||
else:
|
||||
verbosity_name = quiet_list[min(len(quiet_list), -verbosity) - 1]
|
||||
|
||||
from loguru import logger
|
||||
import sys
|
||||
|
||||
logger.remove()
|
||||
if self.verbosity is None:
|
||||
return
|
||||
logger.configure()
|
||||
if self.verbosity > 0:
|
||||
logger.add(sink=sys.stderr, level=verbosity_name, colorize=sys.stderr.isatty())
|
||||
logger.opt(colors=True)
|
||||
else:
|
||||
logger.add(
|
||||
sink=sys.stderr, level=verbosity_name, colorize=False, format="{message}"
|
||||
)
|
||||
logger.debug(f"Verbosity set to level {verbosity} ({verbosity_name})")
|
||||
|
||||
def load_modules(self):
|
||||
import importlib.util
|
||||
|
||||
for i in self.modules:
|
||||
spec = importlib.util.spec_from_file_location("ciphey.module_load_site", i)
|
||||
mod = importlib.util.module_from_spec(spec)
|
||||
spec.loader.exec_module(mod)
|
||||
|
||||
def get_resource(self, res_name: str, t: Optional[Type] = None) -> Any:
|
||||
logger.trace(f"Loading resource {res_name} of type {t}")
|
||||
|
||||
# FIXME: Actually returns obj of type `t`, but python is bad
|
||||
loader, name = split_resource_name(res_name)
|
||||
if t is None:
|
||||
return self(_fwd.registry.get_named(loader, ResourceLoader))(name)
|
||||
else:
|
||||
return self(_fwd.registry.get_named(loader, ResourceLoader[t]))(name)
|
||||
|
||||
def __str__(self):
|
||||
return str({
|
||||
"verbosity": self.verbosity,
|
||||
"searcher": self.searcher,
|
||||
"params": self.params,
|
||||
"format": self.format,
|
||||
"modules": self.modules,
|
||||
"checker": self.checker,
|
||||
"default_dist": self.default_dist,
|
||||
"timeout": self.timeout
|
||||
})
|
||||
|
||||
|
||||
_fwd.config = Config
|
|
@ -0,0 +1,2 @@
|
|||
registry = None
|
||||
config = type(None)
|
|
@ -0,0 +1,304 @@
|
|||
from abc import ABC, abstractmethod
|
||||
from typing import (
|
||||
Any,
|
||||
Callable,
|
||||
Dict,
|
||||
Generic,
|
||||
Optional,
|
||||
List,
|
||||
NamedTuple,
|
||||
TypeVar,
|
||||
Type,
|
||||
Union,
|
||||
Set,
|
||||
)
|
||||
import pydoc
|
||||
|
||||
from loguru import logger
|
||||
|
||||
import datetime
|
||||
|
||||
from ._fwd import config as Config
|
||||
|
||||
T = TypeVar("T")
|
||||
U = TypeVar("U")
|
||||
|
||||
|
||||
class ParamSpec(NamedTuple):
|
||||
"""
|
||||
Attributes:
|
||||
req Whether this argument is required
|
||||
desc A description of what this argument does
|
||||
default The default value for this argument. Ignored if req == True or configPath is not None
|
||||
config_ref The path to the config that should be the default value
|
||||
list Whether this parameter is in the form of a list, and can therefore be specified more than once
|
||||
visible Whether the user can tweak this via the command line
|
||||
"""
|
||||
|
||||
req: bool
|
||||
desc: str
|
||||
default: Optional[Any] = None
|
||||
list: bool = False
|
||||
config_ref: Optional[List[str]] = None
|
||||
visible: bool = False
|
||||
|
||||
|
||||
class ConfigurableModule(ABC):
|
||||
@staticmethod
|
||||
@abstractmethod
|
||||
def getParams() -> Optional[Dict[str, ParamSpec]]:
|
||||
"""
|
||||
Returns a dictionary of `argument name: argument specification`
|
||||
"""
|
||||
pass
|
||||
|
||||
def _checkParams(self):
|
||||
"""
|
||||
Fills the given params dict with default values where arguments are not given,
|
||||
using None as the default value for default values
|
||||
"""
|
||||
|
||||
params = self._params()
|
||||
config = self._config()
|
||||
|
||||
for key, value in self.getParams().items():
|
||||
# If we already have it, then we don't need to do anything
|
||||
if key in params:
|
||||
continue
|
||||
# If we don't have it, but it's required, then fail
|
||||
if value.req:
|
||||
raise KeyError(
|
||||
f"Missing required param {key} for {type(self).__name__.lower()}"
|
||||
)
|
||||
# If it's a reference by default, fill that in
|
||||
if value.config_ref is not None:
|
||||
tmp = getattr(config, value.config_ref[0])
|
||||
params[key] = (
|
||||
tmp[value.config_ref[1:]] if len(value.config_ref) > 1 else tmp
|
||||
)
|
||||
# Otherwise, put in the default value (if it exists)
|
||||
elif value.default is not None:
|
||||
params[key] = value.default
|
||||
|
||||
def _params(self):
|
||||
return self._params_obj
|
||||
|
||||
def _config(self):
|
||||
return self._config_obj
|
||||
|
||||
@abstractmethod
|
||||
def __init__(self, config: Config):
|
||||
self._config_obj = config
|
||||
if self.getParams() is not None:
|
||||
self._params_obj = config.params.setdefault(type(self).__name__.lower(), {})
|
||||
self._checkParams()
|
||||
|
||||
|
||||
class Targeted(ABC):
|
||||
@staticmethod
|
||||
@abstractmethod
|
||||
def getTarget() -> str:
|
||||
"""Should return the target that this object attacks/decodes"""
|
||||
pass
|
||||
|
||||
|
||||
class Checker(Generic[T], ConfigurableModule):
|
||||
@abstractmethod
|
||||
def check(self, text: T) -> Optional[str]:
|
||||
"""Should return some description (or an empty string) on success, otherwise return None"""
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def getExpectedRuntime(self, text: T) -> float:
|
||||
pass
|
||||
|
||||
def __call__(self, *args):
|
||||
return self.check(*args)
|
||||
|
||||
@abstractmethod
|
||||
def __init__(self, config: Config):
|
||||
super().__init__(config)
|
||||
|
||||
|
||||
# class Detector(Generic[T], ConfigurableModule, KnownUtility, Targeted):
|
||||
# @abstractmethod
|
||||
# def scoreLikelihood(self, ctext: T) -> Dict[str, float]:
|
||||
# """Should return a dictionary of (cipher_name: score)"""
|
||||
# pass
|
||||
#
|
||||
# def __call__(self, *args): return self.scoreLikelihood(*args)
|
||||
#
|
||||
# @abstractmethod
|
||||
# def __init__(self, config: Config): super().__init__(config)
|
||||
|
||||
|
||||
class Decoder(Generic[T, U], ConfigurableModule, Targeted):
|
||||
"""Represents the undoing of some encoding into a different (or the same) type"""
|
||||
|
||||
@abstractmethod
|
||||
def decode(self, ctext: T) -> Optional[U]:
|
||||
pass
|
||||
|
||||
@staticmethod
|
||||
@abstractmethod
|
||||
def priority() -> float:
|
||||
"""What proportion of decodings are this?"""
|
||||
pass
|
||||
|
||||
def __call__(self, *args):
|
||||
return self.decode(*args)
|
||||
|
||||
@abstractmethod
|
||||
def __init__(self, config: Config):
|
||||
super().__init__(config)
|
||||
|
||||
|
||||
class DecoderComparer:
|
||||
value: Type[Decoder]
|
||||
|
||||
def __le__(self, other: "DecoderComparer"):
|
||||
return self.value.priority() <= other.value.priority()
|
||||
|
||||
def __ge__(self, other: "DecoderComparer"):
|
||||
return self.value.priority() >= other.value.priority()
|
||||
|
||||
def __lt__(self, other: "DecoderComparer"):
|
||||
return self.value.priority() < other.value.priority() and self != other
|
||||
|
||||
def __gt__(self, other: "DecoderComparer"):
|
||||
return self.value.priority() > other.value.priority() and self != other
|
||||
|
||||
def __init__(self, value: Type[Decoder]):
|
||||
self.value = value
|
||||
|
||||
def __repr__(self):
|
||||
return f"<DecoderComparer {self.value}:{self.value.priority()}>"
|
||||
|
||||
|
||||
class CrackResult(NamedTuple, Generic[T]):
|
||||
value: T
|
||||
key_info: Optional[str] = None
|
||||
misc_info: Optional[str] = None
|
||||
|
||||
|
||||
class CrackInfo(NamedTuple):
|
||||
success_likelihood: float
|
||||
success_runtime: float
|
||||
failure_runtime: float
|
||||
|
||||
|
||||
class Cracker(Generic[T], ConfigurableModule, Targeted):
|
||||
@abstractmethod
|
||||
def getInfo(self, ctext: T) -> CrackInfo:
|
||||
"""Should return some informed guesses on resource consumption when run on `ctext`"""
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def attemptCrack(self, ctext: T) -> List[CrackResult]:
|
||||
"""
|
||||
This should attempt to crack the cipher `target`, and return a list of candidate solutions
|
||||
"""
|
||||
# FIXME: Actually CrackResult[T], but python complains
|
||||
pass
|
||||
|
||||
def __call__(self, *args):
|
||||
return self.attemptCrack(*args)
|
||||
|
||||
@abstractmethod
|
||||
def __init__(self, config: Config):
|
||||
super().__init__(config)
|
||||
|
||||
|
||||
class ResourceLoader(Generic[T], ConfigurableModule):
|
||||
@abstractmethod
|
||||
def whatResources(self) -> Optional[Set[str]]:
|
||||
"""
|
||||
Return a set of the names of instances T you can provide.
|
||||
The names SHOULD be unique amongst ResourceLoaders of the same type
|
||||
|
||||
These names will be exposed as f"{self.__name__}::{name}", use split_resource_name to recover this
|
||||
|
||||
If you cannot reasonably determine what resources you provide, return None instead
|
||||
"""
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def getResource(self, name: str) -> T:
|
||||
"""
|
||||
Returns the requested distribution
|
||||
|
||||
The behaviour is undefined if `name not in self.what_resources()`
|
||||
"""
|
||||
pass
|
||||
|
||||
def __call__(self, *args):
|
||||
return self.getResource(*args)
|
||||
|
||||
def __getitem__(self, *args):
|
||||
return self.getResource(*args)
|
||||
|
||||
@abstractmethod
|
||||
def __init__(self, config: Config):
|
||||
super().__init__(config)
|
||||
|
||||
|
||||
class SearchLevel(NamedTuple):
|
||||
name: str
|
||||
result: CrackResult
|
||||
|
||||
|
||||
class SearchResult(NamedTuple):
|
||||
path: List[SearchLevel]
|
||||
check_res: str
|
||||
|
||||
|
||||
class Searcher(ConfigurableModule):
|
||||
"""A very basic interface for code that plans out how to crack the ciphertext"""
|
||||
|
||||
@abstractmethod
|
||||
def search(self, ctext: Any) -> SearchResult:
|
||||
"""Returns the path to the correct ciphertext"""
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def __init__(self, config: Config):
|
||||
super().__init__(config)
|
||||
|
||||
|
||||
def pretty_search_results(res: SearchResult, display_intermediate: bool = False):
|
||||
ret: str = f'Final result: "{res.path[-1].result.value}"\n'
|
||||
if len(res.check_res) != 0:
|
||||
ret += f"Checker: {res.check_res}\n"
|
||||
ret += "Format used:\n"
|
||||
|
||||
def add_one():
|
||||
nonlocal ret
|
||||
ret += f" {i.name}"
|
||||
already_broken = False
|
||||
if i.result.key_info is not None:
|
||||
ret += f":\n Key: {i.result.key_info}\n"
|
||||
already_broken = True
|
||||
if i.result.misc_info is not None:
|
||||
if not already_broken:
|
||||
ret += ":\n"
|
||||
ret += f" Misc: {i.result.misc_info}\n"
|
||||
already_broken = True
|
||||
if display_intermediate:
|
||||
if not already_broken:
|
||||
ret += ":\n"
|
||||
ret += f' Value: "{i.result.value}"\n'
|
||||
already_broken = True
|
||||
if not already_broken:
|
||||
ret += "\n"
|
||||
|
||||
# Skip the 'input' and print in reverse order
|
||||
for i in res.path[1:][::-1]:
|
||||
add_one()
|
||||
|
||||
# Remove trailing newline
|
||||
return ret[:-1]
|
||||
|
||||
|
||||
# Some common collection types
|
||||
Distribution = Dict[str, float]
|
||||
WordList = Set[str]
|
|
@ -0,0 +1,143 @@
|
|||
from abc import ABC, abstractmethod
|
||||
from collections import defaultdict
|
||||
from typing import (
|
||||
Any,
|
||||
Callable,
|
||||
Dict,
|
||||
Generic,
|
||||
Optional,
|
||||
List,
|
||||
NamedTuple,
|
||||
TypeVar,
|
||||
Type,
|
||||
Union,
|
||||
Set,
|
||||
Tuple,
|
||||
)
|
||||
import pydoc
|
||||
|
||||
try:
|
||||
from typing import get_origin, get_args
|
||||
except ImportError:
|
||||
from typing_inspect import get_origin, get_args
|
||||
|
||||
from loguru import logger
|
||||
from . import _fwd
|
||||
from ._modules import *
|
||||
import datetime
|
||||
|
||||
|
||||
class Registry:
|
||||
# I was planning on using __init_subclass__, but that is incompatible with dynamic type creation when we have
|
||||
# generic keys
|
||||
|
||||
RegElem = Union[List[Type], Dict[Type, "RegElem"]]
|
||||
|
||||
_reg: Dict[Type, RegElem] = {}
|
||||
_names: Dict[str, Tuple[Type, Set[Type]]] = {}
|
||||
_targets: Dict[str, Dict[Type, List[Type]]] = {}
|
||||
_modules = {Checker, Cracker, Decoder, ResourceLoader, Searcher}
|
||||
|
||||
def _register_one(self, input_type, module_base, module_args):
|
||||
target_reg = self._reg.setdefault(module_base, {})
|
||||
# Seek to the given type
|
||||
for subtype in module_args[0:-1]:
|
||||
target_reg = target_reg.setdefault(subtype, {})
|
||||
target_reg.setdefault(module_args[-1], []).append(input_type)
|
||||
|
||||
def _real_register(self, input_type: type, *args) -> Type:
|
||||
name_target = self._names[input_type.__name__.lower()] = (input_type, set())
|
||||
|
||||
if issubclass(input_type, Targeted):
|
||||
target = input_type.getTarget()
|
||||
else:
|
||||
target = None
|
||||
|
||||
if issubclass(input_type, Searcher):
|
||||
module_type = module_base = Searcher
|
||||
module_args = ()
|
||||
else:
|
||||
module_type: Optional[Type] = None
|
||||
module_base = None
|
||||
|
||||
# Work out what module type this is
|
||||
if len(args) == 0:
|
||||
for i in input_type.__orig_bases__:
|
||||
if module_type is not None:
|
||||
raise TypeError(f"Type derived from multiple registrable base classes {i} and {module_type}")
|
||||
module_base = get_origin(i)
|
||||
if module_base not in self._modules:
|
||||
continue
|
||||
module_type = i
|
||||
else:
|
||||
for i in self._modules:
|
||||
if not issubclass(input_type, i):
|
||||
continue
|
||||
if module_type is not None:
|
||||
raise TypeError(f"Type derived from multiple registrable base classes {i} and {module_type}")
|
||||
module_type = i
|
||||
if module_type is None:
|
||||
raise TypeError("No registrable base class")
|
||||
|
||||
# Now handle the difference between register and register_multi
|
||||
if len(args) == 0:
|
||||
if module_base is None:
|
||||
raise TypeError("No type argument given")
|
||||
self._register_one(input_type, module_base, get_args(module_type))
|
||||
name_target[1].add(module_base)
|
||||
else:
|
||||
if module_base is not None:
|
||||
raise TypeError(f"Redundant type argument for {module_type}")
|
||||
module_base = module_type
|
||||
for module_args in args:
|
||||
# Correct missing brackets
|
||||
if not isinstance(module_args, tuple):
|
||||
module_args = (module_args,)
|
||||
|
||||
self._register_one(input_type, module_base, module_args)
|
||||
name_target[1].add(module_type[module_args])
|
||||
|
||||
name_target[1].add(module_type)
|
||||
|
||||
if target is not None and issubclass(module_base, Targeted):
|
||||
self._targets.setdefault(target, {}).setdefault(module_type, []).append(input_type)
|
||||
|
||||
return input_type
|
||||
|
||||
def register(self, input_type):
|
||||
self._real_register(input_type)
|
||||
|
||||
def register_multi(self, *x):
|
||||
return lambda input_type: self._real_register(input_type, *x)
|
||||
|
||||
def __getitem__(self, i: type) -> Optional[Any]:
|
||||
target_type = get_origin(i)
|
||||
# Check if this is a non-generic type, and return the whole dict if it is
|
||||
if target_type is None:
|
||||
return self._reg[i]
|
||||
|
||||
target_subtypes = get_args(i)
|
||||
target_list = self._reg.setdefault(target_type, {})
|
||||
for subtype in target_subtypes:
|
||||
target_list = target_list.setdefault(subtype, {})
|
||||
return target_list
|
||||
|
||||
def get_named(self, name: str, type_constraint: Type = None) -> Any:
|
||||
ret = self._names[name.lower()]
|
||||
if type_constraint and type_constraint not in ret[1]:
|
||||
raise TypeError(f"Type mismatch: wanted {type_constraint}, got {ret[1]}")
|
||||
return ret[0]
|
||||
|
||||
def get_targeted(
|
||||
self, target: str, type_constraint: Type = None
|
||||
) -> Optional[Union[Dict[Type, Set[Type]], Set[Type]]]:
|
||||
x = self._targets.get(target)
|
||||
if x is None or type_constraint is None:
|
||||
return x
|
||||
return x.get(type_constraint)
|
||||
|
||||
def __str__(self):
|
||||
return f"ciphey.iface.Registry {{_reg: {self._reg}, _names: {self._names}, _targets: {self._targets}}}"
|
||||
|
||||
|
||||
_fwd.registry = Registry()
|
|
@ -91,44 +91,44 @@ class mathsHelper:
|
|||
while counter_max < counter_prob:
|
||||
max_overall = 0
|
||||
highest_key = None
|
||||
logger.debug(
|
||||
logger.trace(
|
||||
f"Running while loop in sort_prob_table, counterMax is {counter_max}"
|
||||
)
|
||||
for key, value in prob_table.items():
|
||||
logger.debug(f"Sorting {key}")
|
||||
logger.trace(f"Sorting {key}")
|
||||
maxLocal = 0
|
||||
# for each item in that table
|
||||
for key2, value2 in value.items():
|
||||
logger.debug(
|
||||
logger.trace(
|
||||
f"Running key2 {key2}, value2 {value2} for loop for {value.items()}"
|
||||
)
|
||||
maxLocal = maxLocal + value2
|
||||
logger.debug(
|
||||
logger.trace(
|
||||
f"MaxLocal is {maxLocal} and maxOverall is {max_overall}"
|
||||
)
|
||||
if maxLocal > max_overall:
|
||||
logger.debug(f"New max local found {maxLocal}")
|
||||
logger.trace(f"New max local found {maxLocal}")
|
||||
# because the dict doesnt reset
|
||||
max_dict_pair = {}
|
||||
max_overall = maxLocal
|
||||
# so eventually, we get the maximum dict pairing?
|
||||
max_dict_pair[key] = value
|
||||
highest_key = key
|
||||
logger.debug(f"Highest key is {highest_key}")
|
||||
logger.trace(f"Highest key is {highest_key}")
|
||||
# removes the highest key from the prob table
|
||||
logger.debug(f"Prob table is {prob_table} and highest key is {highest_key}")
|
||||
logger.debug(f"Removing {prob_table[highest_key]}")
|
||||
logger.trace(f"Prob table is {prob_table} and highest key is {highest_key}")
|
||||
logger.trace(f"Removing {prob_table[highest_key]}")
|
||||
del prob_table[highest_key]
|
||||
logger.debug(f"Prob table after deletion is {prob_table}")
|
||||
logger.trace(f"Prob table after deletion is {prob_table}")
|
||||
counter_max += 1
|
||||
empty_dict = {**empty_dict, **max_dict_pair}
|
||||
|
||||
# returns the max dict (at the start) with the prob table
|
||||
# this way, it should always work on most likely first.
|
||||
logger.debug(
|
||||
logger.trace(
|
||||
f"The prob table is {prob_table} and the maxDictPair is {max_dict_pair}"
|
||||
)
|
||||
logger.debug(f"The new sorted prob table is {empty_dict}")
|
||||
logger.trace(f"The new sorted prob table is {empty_dict}")
|
||||
return empty_dict
|
||||
|
||||
@staticmethod
|
||||
|
@ -145,11 +145,11 @@ class mathsHelper:
|
|||
|
||||
"""
|
||||
# (f"d is {d}")
|
||||
logger.debug(f"The old dictionary before new_sort() is {new_dict}")
|
||||
logger.trace(f"The old dictionary before new_sort() is {new_dict}")
|
||||
sorted_i = OrderedDict(
|
||||
sorted(new_dict.items(), key=lambda x: x[1], reverse=True)
|
||||
)
|
||||
logger.debug(f"The dictionary after new_sort() is {sorted_i}")
|
||||
logger.trace(f"The dictionary after new_sort() is {sorted_i}")
|
||||
# sortedI = sort_dictionary(x)
|
||||
return sorted_i
|
||||
|
||||
|
@ -185,7 +185,3 @@ class mathsHelper:
|
|||
"""
|
||||
text: str = str(text).translate(str.maketrans("", "", punctuation))
|
||||
return text
|
||||
|
||||
|
||||
|
||||
|
||||
|
|
|
@ -1,71 +0,0 @@
|
|||
# i need the below code to make tensorflow shut up
|
||||
import os
|
||||
|
||||
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3"
|
||||
|
||||
import tensorflow as tf
|
||||
from scipy.stats import chisquare
|
||||
from tensorflow.keras.callbacks import TensorBoard
|
||||
from tensorflow.keras.layers import (
|
||||
Activation,
|
||||
Conv2D,
|
||||
Dense,
|
||||
Dropout,
|
||||
Flatten,
|
||||
MaxPooling2D,
|
||||
Reshape,
|
||||
)
|
||||
from tensorflow.keras.models import Sequential, load_model
|
||||
from string import punctuation
|
||||
import numpy
|
||||
import sys
|
||||
import cipheydists
|
||||
|
||||
sys.path.append("..")
|
||||
try:
|
||||
import ciphey.mathsHelper as mh
|
||||
except ModuleNotFoundError:
|
||||
import mathsHelper as mh
|
||||
|
||||
# i need the below code to make tensorflow shut up. Yup, it's SO bad you have to have 2 LINES TO MAKE IT SHUT UP!!!
|
||||
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)
|
||||
|
||||
|
||||
class NeuralNetwork:
|
||||
"""
|
||||
Class to use the neural network
|
||||
"""
|
||||
|
||||
def __init__(self):
|
||||
self.CATEGORIES = ["sha1", "md5", "sha256", "sha512", "caeser", "plaintext"]
|
||||
self.CATEGORIES = [1, 2, 3, 4, 5, 6]
|
||||
# self.MODEL = load_model("cipher_detector.h5")
|
||||
self.MODEL = load_model(cipheydists.get_model("cipher_detector"))
|
||||
|
||||
self.mh = mh.mathsHelper()
|
||||
|
||||
def formatData(self, text):
|
||||
"""
|
||||
formats the data
|
||||
"""
|
||||
result = []
|
||||
result.append(len(text))
|
||||
result.append(len(list(set(list(text)))))
|
||||
return result
|
||||
|
||||
def editData(self, data):
|
||||
"""
|
||||
Data has to be in format:
|
||||
* [length of text, how many unique letters it has, the normalised chi square score]
|
||||
"""
|
||||
new = []
|
||||
new.append(self.formatData(data))
|
||||
return numpy.asarray(new)
|
||||
|
||||
def predictnn(self, text):
|
||||
"""
|
||||
use this to create predictions for the NN
|
||||
returns softmax (probability distribution)
|
||||
"""
|
||||
text = self.editData(text)
|
||||
return self.MODEL.predict(text)
|
|
@ -13,6 +13,46 @@ These are taken from the GitHub Issues tab.
|
|||
* The code is now pep8'd
|
||||
* Move to Poetry from setuptools.py
|
||||
* Moved to Pytest from unittest
|
||||
## 5.0.0 _The Great Refactor_
|
||||
_The Great Refactor_ is version 5 of Ciphey. The entire program was refactored.
|
||||
#### Features
|
||||
* Added base58 Bitcoin
|
||||
* Added base58 Ripple
|
||||
* Added Base62 (link shortener char set)
|
||||
* Added base85
|
||||
* Added base85 asciii
|
||||
* **A brand new cipher detection interface**
|
||||
* **Much faster, much more accuracte `brandon` interface, which is the default plaintext checking interface**
|
||||
* Recursive decryption methods. Is your text base64 -> binary -> caesar -> vigenere? Ciphey can handle it now. I was told I'm not supposed to talk about nerdy things in the changelog but.... We're using A* search with the weight being how many computations it takes and the heuristic being the likelihood chance. Pretty nifty!
|
||||
* Now on Winget (Windows Package Manager)
|
||||
* Brandon interface now has a stopwords checker, with 97% accuracy and high speeds (0.0000006 seconds on average).
|
||||
* Brandon checker's dictionary checker has 99% accuracy on average across all sentence lengths and compeltes in 0.002 seconds.
|
||||
* Added a regex checker, so the user can enter a regex like `THM{*}` and the checker will find it.
|
||||
* Added a neural network that can detect English.
|
||||
* Created `settings.yml`, a settings file which lets the user change how Ciphey works internally.
|
||||
* Added flag `--where`, which tells you where Ciphey expects the `settings.yml` file to be.
|
||||
* Added `regexFile` to `settings.yml`, which is where the user can sotre all the regexes they want the regex checker to check against.
|
||||
* Now on Homebrew for Mac OS
|
||||
* Now on the Arch User Respository
|
||||
#### Bug Fixes
|
||||
* Morse code is now better optimised and works across multiple different Morse alphabets.
|
||||
* Fixed issue where Vigenere broke on inputs of equal signs.
|
||||
* Fixed issue where dictionary.txt was too small.
|
||||
* Updated stopwords & 1k words dictionaries.
|
||||
#### Maintenance
|
||||
* Tensorflow is reduced from 500mb to a 1mb install using TF lite.
|
||||
* Models are now parsed in C++
|
||||
* More documentation written
|
||||
* Changed Contributing file
|
||||
* Created speed_test.py, which is used to help add new lanuages to Brandon checker.
|
||||
* Added the JSON selection system for CipheyDists.
|
||||
* The Ciphey main dictionary now supports UK, USA, AU, CAN dialects of English.
|
||||
* Many, many more tests were added to the program.
|
||||
* Targetting system added to main(), now Ciphey can internally target any cipher instead of previously where the cipher couldn't be manually chosen.
|
||||
* The settings file is automatically searched for in APPDIRS.
|
||||
* Moved the docs to its own dedicated GitHub Repo
|
||||
* Used Terminalizer to record pretty gifs
|
||||
* Redesigned the README
|
||||
## 4.1
|
||||
#### Features
|
||||
* Vigenere is now enabled, due to massive performance gains from the C++ core
|
||||
|
|
|
@ -23,6 +23,8 @@ Encodings
|
|||
* Hexadecimal
|
||||
* Binary
|
||||
* Morse Code
|
||||
* Morse code with new lines
|
||||
* Octal Decoding (Base8)
|
||||
|
||||
Hashes
|
||||
-------
|
||||
|
@ -34,4 +36,4 @@ Hashes
|
|||
|
||||
What Ciphers are going to be implemented next?
|
||||
-----------------------------------------------
|
||||
`See this GitHub issue <https://github.com/Ciphey/Ciphey/issues/63>`
|
||||
`See this GitHub issue <https://github.com/Ciphey/Ciphey/issues/63>`_
|
||||
|
|
|
@ -19,11 +19,34 @@ Structure
|
|||
.. code:: python
|
||||
|
||||
{
|
||||
"ctext": "str: The ciphertext that is being attacked",
|
||||
"grep": "bool: The greppable flag",
|
||||
"info": "bool: The info flag",
|
||||
"debug": "str: The loguru debug level",
|
||||
"checker": "LanguageChecker: an instance of the selected language checker",
|
||||
"wordlist": "AbstractSet[Str]: The selected wordlist",
|
||||
"params": "Dict[Str, str]: The given module parameters"
|
||||
"debug": "str: The loguru debug level, one of ['TRACE', 'DEBUG', 'WARNING', 'ERROR', None/~]",
|
||||
"checker": "str: The name of the language checker class to be used",
|
||||
"params": "Dict[str, Dict[str, Union[List[str], str]]]: The given module parameters, indexed by the module name and the param name",
|
||||
"modules": "List[str]: Paths to modules that should be loaded",
|
||||
"utility_threshold": "float: A value between 0 and 2 representing what Detectors should be used in the first pass",
|
||||
"format": "Dict[str, str]: formed of 'in' and 'out', which map to the name of their respective types"
|
||||
}
|
||||
|
||||
These are the defaults, represented as a YAML config file.
|
||||
An omission of any field will result in these values being used
|
||||
|
||||
.. code:: yaml
|
||||
|
||||
grep: false
|
||||
info: false
|
||||
debug: WARNING
|
||||
checker: brandon
|
||||
format:
|
||||
in: str
|
||||
out: str
|
||||
utility_threshold: 1.5
|
||||
score_threshold: 0.8
|
||||
|
||||
The following internal modules are loaded by default, even if not specified:
|
||||
|
||||
* The ``brandon`` LanguageChecker
|
||||
* The ``cipheydists`` collection of Distributions, CharSets and WordLists
|
||||
* The ``json`` Distribution, CharSet and WordLists, that load in a json file
|
||||
* The ``csv`` Distribution, CharSet and WordLists, that load in a csv file
|
||||
|
|
|
@ -0,0 +1,47 @@
|
|||
Features
|
||||
==========
|
||||
|
||||
* 20+ encryptions supported :ciphers:`click here for the full list`.
|
||||
* Advance cipher targetting system, making use of artifical intelligence and common sense.
|
||||
Are you sick of artifical intelligence bloating everything? We only use it when it is **absolutely necessary**. The common sense part is important. If we see a string like "010100010100000111" we assume it is binary.
|
||||
* Custom built natural language processing module(s).
|
||||
Language Checker checks to see if the given text is plaintext. We do this either with the Brandon checker (the default checker), the deep neural network, or regex.
|
||||
* Regex checker
|
||||
If you have text you know is plaintext, such as _HTB{e563d8ae4b557d21060bfeb2a06d5cb2}_ but clearly won't be picked up as a language, use the regex checker.
|
||||
* Multi language support
|
||||
Also to note, Ciphey's default checker, Brandon, has multi language support and currently supports English & German.
|
||||
* C++ Core
|
||||
Ciphey has a C++ core for cryptanalysis tidbits. Python is very slow, but C++ is very fast. By offloading the bruteforcing of the program, we saw speed increases such as Caesar Cipher's 30% speed increase.
|
||||
|
||||
* Supports Hashes & Encryptions
|
||||
Other online tools may only support encodings, hashes, or encryptions. Ciphey supports all of them!
|
||||
* Tweakable
|
||||
Ciphey has a settings.yml file. This file lets you tweak the internal procedures of Ciphey. Want to use the German dictionary for phase 1 of language checker, and then the English dictionary? No worries! You can do that.
|
||||
|
||||
Do you have a bunch of regexes, but hate inputting them manually? Store them in the settings file.
|
||||
|
||||
* Extensively tested with a lot of documentation
|
||||
Everytime Ciphey goes for a release, it gets tested by many hand-written unit tests. And then, an automated testing system tests Ciphey 20,000 times over to make sure nothing breaks.
|
||||
|
||||
* Not opionated
|
||||
Base64 has an alternative syntax, but many online decoders don't make use of the alternative syntax. Opting to give you the most popular one. Thus, they are optionated.
|
||||
|
||||
Ciphey strays from this as much as possible. We try not to hold an opinion on anything we don't need to. Alternative syntax is available for many modules, and is automatically tested against. No more worrying if Ciphey
|
||||
|
||||
* Easy to contribute to
|
||||
Want to add a new language? We have an easy to follow guide on this documentation.
|
||||
Want to add more decryption methods? Again, easy to follow guide.
|
||||
|
||||
Ciphey is designed to be as modular as possible, so anyone wishing to contribute simply has to push their module and Ciphey will work with it.
|
||||
|
||||
* Built by the CTF community, for the CTF community
|
||||
|
||||
Ciphey was originally built for the Geocaching community, but is now built mainly for the CTF community. Although, it can be used by anyone.
|
||||
|
||||
Cyclic3 & Brandon (core maintainers) are commitee members of the Liverpool Cyber Security Society. Both regularly attend CTFs and win some too.
|
||||
|
||||
Brandon was #2 on the TryHackMe leaderboards.
|
||||
|
||||
All code contributors or maintainers have been in CTFs, and we are all very active in the CTF community.
|
||||
|
||||
Ciphey is built by the CTF community, for the CTF community.
|
|
@ -0,0 +1,6 @@
|
|||
How does Ciphey work? An in-depth guide
|
||||
========================================
|
||||
|
||||
First, when Ciphey is ran, it parses the arguments using `argpars` library and/or manual parsing of the arguments, depending on what is given to Ciphey.
|
||||
|
||||
Then, Ciphey sends the inputted text and arguments to the cipher detection interface.
|
|
@ -0,0 +1,58 @@
|
|||
Brandon Interface
|
||||
==================
|
||||
The Brandon interface is the default language checking interface for Ciphey. So named because it is an algorithm created by Brandon, and we couldn't come up with any clever names for it at the time.
|
||||
|
||||
|
||||
Contributing your own language
|
||||
------------------------------
|
||||
1. Get a dictionary of your language
|
||||
2. Get stop words of your language
|
||||
3. Get the top 1000 words of your language
|
||||
4. Get the alphabet of your language
|
||||
5. Get the frequency distribution of your language. We suggest taking a very popular large text (for English we used Charles Dickens' complete works) and calculating the frequency distribution yourself.
|
||||
6. Add these to CipheyDists with the appropriate names and in appropriate folders.
|
||||
7. Calculate the thresholds / sentence lengths using the program detailed in the secion in this document.
|
||||
8. Pull requestr and you're done!
|
||||
|
||||
How were the thresholds / sentence lengths chosen?
|
||||
--------------------------------------------------
|
||||
|
||||
Brandon (the person) created a program to automatically test which checkers, sentence lengths, and thresholds were best for the newest version of Brandon checker.
|
||||
|
||||
The most important thing about the tests was "which is the best metric we can use as a phase 1 checker?" The tests consisted of:
|
||||
* Lemminization
|
||||
* Stop words
|
||||
* Check 1000 words
|
||||
* Word Endings
|
||||
* Word endings with 3 chars
|
||||
|
||||
Each one was tested 20,000 times for accuracy & speed. Only stop words & check 1000 words survived this testing, both being high accuracy and incredibly fast.
|
||||
|
||||
Stopwords is a lot faster than Check 1000 words, but on much smaller texts it has terrible accuracy. Naturally longer plaintexts have higher amounts of stop words.
|
||||
|
||||
Naturally, Brandon questioned whether it was worth it to check the length of the text, and change the checker to increase the accuracy whilest maintaining high speed.
|
||||
|
||||
Preliminary tests showed that this was true. Stopwords had an accuracy of 85% on shorter texts, whereas check1000 words had an accuracy of 97%. On much higher texts, stopwords had an equal accuracy but is much faster.
|
||||
|
||||
A sentence is defined as "a single sentence from the corpus of Hansard.txt". The sentence lengths tested were 1, 2, 3, 4, 5 and 20.
|
||||
|
||||
After Brandon had found the best checkers for the certain sentence lengths, he calculated the mean average len() of each sentence. This is as follows:
|
||||
|
||||
1 : The mean is 87.62
|
||||
2 : The mean is 110.47925
|
||||
3 : The mean is 132.20016666666666
|
||||
4 : The mean is 154.817125
|
||||
5 : The mean is 178.7297
|
||||
20: The mean is 714.9188
|
||||
|
||||
Next, the question of percentage thresholds.
|
||||
|
||||
Brandon realised that hard coding in thresholds (such as 55%) was a stupid idea. Surely there exists ideal thresholds that optimise the accuracy of the checker. And surely these thresholds change over the sentence length (stopwords would need a higher threshold for smaller texts but as the text size inceases it can use a lower threshold).
|
||||
|
||||
This means that the threshold & checker changes depending on the text size.
|
||||
|
||||
What languags are supported?
|
||||
----------------------------
|
||||
* English
|
||||
* German
|
||||
|
|
@ -0,0 +1,13 @@
|
|||
How do I know you're not taking the plaintext and storing it?
|
||||
==========================================================
|
||||
Valid concerns, but here are multiple ways to make sure we aren't taking your plaintext and storing it somewhere.
|
||||
1. Read the source code on GitHub
|
||||
2. Read the source code of the file you downloaded
|
||||
3. Use Burp Suite to look at what is being sent to us
|
||||
4. Use Wireshark to do the same
|
||||
5. Understand we are 2 university students and this is being ran off of a GitHub Education Pack plan. We literally do not have the resources to do anything bad.
|
||||
6. If we did store the plaintext, and our university found out, we will lose our degrees and that's scary.
|
||||
7. Checksums
|
||||
8. We have no interest in the plaintext it's not useful to us at all.
|
||||
9. If you are still paranoid you can copy and paste the file, and open a GitHub issue with it.
|
||||
10. We're mentors / president / on commitee of many cyber security organisations. If we did something this stupid, not only would we lose our degree but all credability.
|
|
@ -0,0 +1,60 @@
|
|||
The Settings File
|
||||
=================
|
||||
|
||||
The settings file contains settings for Ciphey. Specifically, some of these you may want:
|
||||
* REGEX list. Have a list of REGEX's for the REGEX checker? Use the settings file.
|
||||
* Default language. Hate how Ciphey always loads in English? Use the settings file to change the default language to whatever you want.
|
||||
* Is the language checker not working how you want it to work? Fine-tune the details in the settings file.
|
||||
|
||||
Default settings file
|
||||
---------------------
|
||||
Save this as settings.yml in the appdirs location, which can be found by running ciphey -where or --where.
|
||||
|
||||
.. code-block:: shell
|
||||
➜ python3 ciphey -where
|
||||
settings.yml should be placed in /home/bee/.config/ciphey
|
||||
|
||||
From this example, we can see that using the argument we need to place the settings file at /home/bee.config/ciphey/settings.yml
|
||||
|
||||
The settings file follows a specific format. **Copy and paste this below!**
|
||||
|
||||
.. code-block:: yaml
|
||||
---
|
||||
language_checker_options:
|
||||
# The language checking options. Basically, this detects plaintext.
|
||||
default_language: "english" # What language do you want to use?
|
||||
default_checker: "brandon"
|
||||
english:
|
||||
dict_name: english # the name of the dict in cipheyDists
|
||||
stopwords_name: english # The name of the stopwords set in cipheyDists
|
||||
brandon: # The brandon checker, the default checker
|
||||
thresholds:
|
||||
# Sentence length: {Checker: percentage threshold}
|
||||
# Want to know how these numbers were selected? Read the docs here TODO
|
||||
"Phase 1": {0: {"check": 0.02}, 110: {"stop": 0.15}, 150: {"stop": 0.28}}
|
||||
"Phase 2": {0: 0.55} # phase 2 threshold
|
||||
german:
|
||||
brandon:
|
||||
dict_name: german
|
||||
stopwords_name: german
|
||||
thresholds:
|
||||
0.55
|
||||
|
||||
regexFile:
|
||||
# Put your custom REGEX here
|
||||
# These 4 REGEX's cover the most popular CTF flag formats.
|
||||
# {.*} means "any text of any size here" and /i means "ignore case".
|
||||
# For example, for the CTf NoobCTF the format would be /NoobCTF{.*}/i
|
||||
- /HTB{.*}/i # TODO HTB strings are just md5s
|
||||
- /THM{.*}/i
|
||||
- /FLAG{*.}/i
|
||||
- /CTF{*.}/i
|
||||
|
||||
Some of the notable options you may want to change:
|
||||
* Default language
|
||||
* Default checker
|
||||
|
||||
And to add more regex, simply list them under the others.
|
||||
|
||||
|
||||
|
|
@ -0,0 +1,6 @@
|
|||
# Entry point used for PyInstaller
|
||||
|
||||
from ciphey.__main__ import main
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
|
@ -0,0 +1,33 @@
|
|||
# -*- mode: python ; coding: utf-8 -*-
|
||||
|
||||
block_cipher = None
|
||||
|
||||
|
||||
a = Analysis(['entry_point.py'],
|
||||
pathex=['/home/bee/Documents/Ciphey'],
|
||||
binaries=[],
|
||||
datas=[],
|
||||
hiddenimports=[],
|
||||
hookspath=[],
|
||||
runtime_hooks=[],
|
||||
excludes=[],
|
||||
win_no_prefer_redirects=False,
|
||||
win_private_assemblies=False,
|
||||
cipher=block_cipher,
|
||||
noarchive=False)
|
||||
pyz = PYZ(a.pure, a.zipped_data,
|
||||
cipher=block_cipher)
|
||||
exe = EXE(pyz,
|
||||
a.scripts,
|
||||
a.binaries,
|
||||
a.zipfiles,
|
||||
a.datas,
|
||||
[],
|
||||
name='entry_point',
|
||||
debug=False,
|
||||
bootloader_ignore_signals=False,
|
||||
strip=False,
|
||||
upx=True,
|
||||
upx_exclude=[],
|
||||
runtime_tmpdir=None,
|
||||
console=True )
|
|
@ -1,6 +1,6 @@
|
|||
[tool.poetry]
|
||||
name = "ciphey"
|
||||
version = "4.2.1"
|
||||
version = "5.0.0rc1"
|
||||
description = "Automated Decryption Tool"
|
||||
authors = ["Brandon <brandon@skerritt.blog>"]
|
||||
license = "MIT"
|
||||
|
@ -8,20 +8,21 @@ documentation = "https://docs.ciphey.online"
|
|||
exclude = ["tests/hansard.txt"]
|
||||
|
||||
[tool.poetry.dependencies]
|
||||
python = "^3.6"
|
||||
python = "^3.7"
|
||||
tensorflow = "^2.1.0"
|
||||
rich = "^1.2.3"
|
||||
loguru = "^0.5.0"
|
||||
pylint = "^2.5.2"
|
||||
flake8 = "^3.8.2"
|
||||
cipheydists = "^0.2.2"
|
||||
cipheycore = "^0.1.5"
|
||||
tflite = "^2.2.0"
|
||||
cipheydists = "^0.3.5"
|
||||
cipheycore = "^0.2.2"
|
||||
appdirs = "^1.4.4"
|
||||
typing_inspect = { version = "^0.6.0", python = "~3.6 || ~3.7" }
|
||||
base58 = "^2.0.1"
|
||||
pybase62 = "^0.4.3"
|
||||
click = "^7.1.2"
|
||||
click-option-group = "^0.5.1"
|
||||
click-completion = "^0.5.2"
|
||||
click-spinner = "^0.1.10"
|
||||
pyyaml = "^5.3.1"
|
||||
|
||||
[tool.poetry.dev-dependencies]
|
||||
pytest-cov = "^2.9.0"
|
||||
|
|
44
settings.yml
|
@ -1,26 +1,34 @@
|
|||
---
|
||||
language_checker_options:
|
||||
# The language checking options. Basically, this detects plaintext.
|
||||
default_language: cipheydists::english # What language do you want to use?
|
||||
default_checker: brandon
|
||||
english:
|
||||
dict_name: cipheydists::english # the name of the dict in cipheyDists
|
||||
|
||||
# Change these to choose your default arguments.
|
||||
arguments:
|
||||
greppable: False # is the output grepped?
|
||||
cipher: False # Do you want extra information on the cipher used?
|
||||
a: 'brandon' # the default language detection interface
|
||||
A: None # The script / module at <path> containing _ciphey_accecptor_ variable
|
||||
p: "value" # Sets the kwarg <param> to <value> when the is_acceptable method is called
|
||||
wordlist: None # Default wordlist file. Set None to a path such as "/usr/share/wordlists/rockyou.txt"
|
||||
t: None # The text you want to decrypt
|
||||
j: Fale # Do you want to run potential candidates against wordlist? Useful for when you think the output is a password. Must set wordlist to use.
|
||||
s: False # skip the neural network and only use simple filtration system
|
||||
printWorkingDirectory: False # tells you where this settings file is
|
||||
|
||||
stopwords_name: cipheydists::englishStopWords # The name of the stopwords set in cipheyDists
|
||||
top1000: cipheydists::english1000
|
||||
brandon: # The brandon checker, the default checker
|
||||
thresholds:
|
||||
# Sentence length: {Checker: percentage threshold}
|
||||
# Want to know how these numbers were selected? Read the docs here TODO
|
||||
"Phase 1": {0: {"check": 0.02}, 110: {"stop": 0.15}, 150: {"stop": 0.28}}
|
||||
"Phase 2": {0: {"dict": 0.92}, 75: {"dict": 0.80}, 110: {"dict": 0.65}, 150: {"dict": 0.55}, 190: {"dict": 0.38}} # phase 2 threshold
|
||||
|
||||
regexFile: # set to path to Regex file to run custom regex
|
||||
german:
|
||||
brandon:
|
||||
dict_name: german
|
||||
stopwords_name: german
|
||||
thresholds:
|
||||
0.55
|
||||
|
||||
regexFile:
|
||||
# Put your custom REGEX here
|
||||
# These 3 REGEX's cover the most popular CTF flag formats.
|
||||
- /HTB{.*}/i
|
||||
# These 4 REGEX's cover the most popular CTF flag formats.
|
||||
# {.*} means "any text of any size here" and /i means "ignore case".
|
||||
# For example, for the CTf NoobCTF the format would be /NoobCTF{.*}/i
|
||||
- /HTB{.*}/i # TODO HTB strings are just md5s
|
||||
- /THM{.*}/i
|
||||
- /FLAG{*.}/i
|
||||
- /CTF{*.}/i
|
||||
|
||||
|
||||
|
||||
|
|
|
@ -0,0 +1,354 @@
|
|||
|
||||
If I'm reading this correctly:
|
||||
> I would suggest a simple lower bound test: we pass if we get more than 25%,and fail if we get lower than 5% (or smth idk) for n consecutive windows.
|
||||
You're suggesting that we run all tests and see if we get 25% Imo that would be much slower. What do you mean by `n windows`?
|
||||
|
||||
Okay, Chi squared is out then!
|
||||
|
||||
> Perhaps we can return an object from the cracker which states what tests have been performed, to save time on redundant analysis. With such information, brandon could make an intelligent decision to just use a wordlist if enough analysis was performed, and the more detailed analysis if it wasn't.
|
||||
This is entirely possible. I will add support to `brandon` checker to skip phase 1 if it receives an dictionary with key `"phase1": True` for `True == skip phase 1`.
|
||||
|
||||
If you have more tests, let me know and I can factor them in.
|
||||
|
||||
In your first reply:
|
||||
https://github.com/Ciphey/Ciphey/issues/90#issuecomment-645046918
|
||||
Point 3:
|
||||
> Be aware that the stuff passed to the checker will most likely be complete gibberish (with a similar freq dist) OR the correct result. A user will not care about an extra second spent on the final correct result, but really will care that every false candidate takes an extra second. The current suggestion seems to be pessimal for the gibberish inputs: maybe add some sanity checks (have I failed to match any word, have I failed to lemmatise any word, etc.)
|
||||
|
||||
I decided to test how well `lem` worked as phase 1. To do this, I created this program:
|
||||
```python
|
||||
"""
|
||||
TL;DR
|
||||
|
||||
Tested over 20,000 times
|
||||
|
||||
Maximum sentence size is 15 sentences
|
||||
1/2 chance of getting 'gibberish' (encrypted text)
|
||||
1/2 chance of getting English text
|
||||
|
||||
Each test is timed using Time module.
|
||||
The accuracy is calculated as to how many true positives we get over the entire run
|
||||
|
||||
"""
|
||||
|
||||
|
||||
import spacy
|
||||
import random
|
||||
import time
|
||||
from statistics import mean
|
||||
import ciphey
|
||||
import enciphey
|
||||
from alive_progress import alive_bar
|
||||
|
||||
nlp = spacy.load("en_core_web_sm")
|
||||
|
||||
f = open("hansard.txt", encoding="ISO-8859-1").read()
|
||||
f = f.split(".")
|
||||
|
||||
enciph = enciphey.encipher()
|
||||
|
||||
|
||||
def lem(text):
|
||||
sentences = nlp(text)
|
||||
return set([word.lemma_ for word in sentences])
|
||||
|
||||
|
||||
def get_random_sentence():
|
||||
if random.randint(0, 1) == 0:
|
||||
x = None
|
||||
while x is None:
|
||||
x = (True, " ".join(random.sample(f, k=random.randint(1, 50))))
|
||||
return x
|
||||
else:
|
||||
x = None
|
||||
while x is None:
|
||||
x = enciph.getRandomEncryptedSentence()
|
||||
x = x["Encrypted Texts"]["EncryptedText"]
|
||||
return (False, x)
|
||||
|
||||
|
||||
# Now to time it and take measurements
|
||||
|
||||
|
||||
def perform():
|
||||
# calculate accuracy
|
||||
total = 0
|
||||
true_returns = 0
|
||||
|
||||
# calculate aveager time
|
||||
time_list = []
|
||||
|
||||
# average sentance size
|
||||
sent_size_list = []
|
||||
items = range(20000)
|
||||
with alive_bar(len(items)) as bar:
|
||||
for i in range(0, 20000):
|
||||
sent = get_random_sentence()
|
||||
text = sent[1]
|
||||
truthy = sent[0]
|
||||
sent_size_list.append(len(text))
|
||||
|
||||
# should be length of chars
|
||||
old = len(text)
|
||||
|
||||
# timing the function
|
||||
tic = time.perf_counter()
|
||||
new = lem(text)
|
||||
tok = time.perf_counter()
|
||||
|
||||
# checking for accuracy
|
||||
new = len(new)
|
||||
# the and here means we only count True Positives
|
||||
if new < old and truthy:
|
||||
true_returns += 1
|
||||
total += 1
|
||||
|
||||
# appending the time
|
||||
t = tok - tic
|
||||
time_list.append(t)
|
||||
bar()
|
||||
|
||||
print(
|
||||
f"The accuracy is {str((true_returns / total) * 100)} \n and the time it took is {str(round(mean(time_list), 2))}. \n The average string size was {str(mean(sent_size_list))}"
|
||||
)
|
||||
|
||||
|
||||
perform()
|
||||
```
|
||||
|
||||
The results were fascinating, to say the least.
|
||||
With a 50/50 chance of the text being gibberish (ciphertext from enCiphey) or sentences from Hansard.txt, we had these results for using lemmization as phase 1:
|
||||
|
||||
```
|
||||
The accuracy is 49.63%
|
||||
and the time it took is 0.02 seconds on average.
|
||||
The average string size was 1133.63255.
|
||||
```
|
||||
|
||||
**We get a 50% accuracy with a speed of 0.02 seconds on average, across 20k tests with the average size of a string being 1133 chars. **
|
||||
|
||||
The accuracy is quite bad considering that a coin flip is 50/50.
|
||||
On average, the user would expect Phase 2 to be entered 50% of the time, which is annoying as phase 2 is quite slow. But by itself it's quite fast.
|
||||
|
||||
I am going to build the "2nd phase" of phase 1 using the While Loop we saw earlier. If we can combine just one more metric, we would see much higher accuracy and again - likely incredibly low latency.
|
||||
|
||||
I will create a table of my results:
|
||||
|
||||
## Table of max sentence length == 50
|
||||
|
||||
| Name | Speed | Accuracy | String Size Average Chars | Epochs | Max Sentence Size |
|
||||
| -------------------------- | ---------------------------- | -------- | ------------------------- | ------ | ----------------- |
|
||||
| Lemmization (lem) | 0.02 seconds | 50% | 1580 | 20,000 | 50 |
|
||||
| Stop word removal | 3.05465052884756e-05 seconds | 96% | 1596 | 20,000 | 50 |
|
||||
| Check1000Words | 0.0005 seconds | 96% | 1597 | 20,000 | 50 |
|
||||
| Word endings | 0.0009 seconds | 95% | 1597 | 20,000 | 50 |
|
||||
|
||||
## Table of max sentence length == 5
|
||||
|
||||
| Name | Speed | Accuracy | String Size Average Chars | Epochs | Max Sentence Size |
|
||||
| -------------------------- | ------------------------------ | -------- | ------------------------- | ------ | ----------------- |
|
||||
| Lemmization (lem) |
|
||||
| Stop word removal | 1.1574924453998391e-05 seconds | 93% | 569 | 20,000 | 5 |
|
||||
| Check1000Words | 0.0006 seconds | 95% | 586 | 20,000 | 5 |
|
||||
| Word endings | 0.0003 seconds | 92% | 482 | 20,000 | 5 |
|
||||
## Table of max sentence length == 1
|
||||
|
||||
| Name | Speed | Accuracy | Threshold | String Size Average Chars | Epochs | Max Sentence Size |
|
||||
| -------------------------- | ------------------------------- | -------- | ------ |------------------------- | ------ | ----------------- |
|
||||
| Lemmization (lem) |
|
||||
| Stop word removal | 1.2532061150591289e-05. seconds | 50% | 481 | 20,000 | 1 |
|
||||
| Check1000Words | 0.0006 seconds | 95% | 586 | 20,000 | 5 |
|
||||
| Word endings | 0.0002 seconds | 86% | 15| 482 | 20,000 | 1 |
|
||||
|
||||
|
||||
## Confusion Matrices & Notes
|
||||
### Lemization
|
||||
|
||||
```
|
||||
Positive Negative
|
||||
Positive 10031 9967
|
||||
Negative 2 0
|
||||
```
|
||||
|
||||
|
||||
### Stop Words
|
||||
This test was performed where the text was not `.lower()`, so the actual accuracy _may_ be a little tiny bit higher since the stop words list is all lowercase.
|
||||
|
||||
50 sentence limit
|
||||
|
||||
```
|
||||
Positive Negative
|
||||
Positive 9913 855
|
||||
Negative 56 9176
|
||||
```
|
||||
|
||||
5 sentence limit:
|
||||
|
||||
```
|
||||
Positive Negative
|
||||
Positive 9513 967
|
||||
Negative 530 8990
|
||||
```
|
||||
|
||||
### Check 1000 words
|
||||
|
||||
50 sentence limit
|
||||
|
||||
```
|
||||
Positive Negative
|
||||
Positive 10008 552
|
||||
Negative 56 9384
|
||||
```
|
||||
|
||||
5 sentence limit
|
||||
|
||||
```
|
||||
Positive Negative
|
||||
Positive 9563 597
|
||||
Negative 397 9443
|
||||
```
|
||||
|
||||
# Analysis
|
||||
**I believe that the best Brandon checker will look at the length of the text, and adjust the % threshold and the exact phase 1 checker per text.**
|
||||
|
||||
The below data is taken from calculations performed over many hours. it shows the best threshold % for the best phase 1 checker with the highest accuracy. These checkers were chosen as others showed a maximum accuracy of 58%.
|
||||
|
||||
```
|
||||
{'check 1000 words': {1: {'Accuracy': 0.925, 'Threshold': 2},
|
||||
2: {'Accuracy': 0.95, 'Threshold': 68},
|
||||
3: {'Accuracy': 0.975, 'Threshold': 62},
|
||||
4: {'Accuracy': 0.98, 'Threshold': 5},
|
||||
5: {'Accuracy': 0.985, 'Threshold': 54}},
|
||||
'stop words': {1: {'Accuracy': 0.865, 'Threshold': 50},
|
||||
2: {'Accuracy': 0.93, 'Threshold': 19},
|
||||
3: {'Accuracy': 0.965, 'Threshold': 15},
|
||||
4: {'Accuracy': 0.97, 'Threshold': 28},
|
||||
5: {'Accuracy': 0.985, 'Threshold': 29}}
|
||||
```
|
||||
|
||||
Where the numbers are:
|
||||
|
||||
```
|
||||
1 : The mean is 87.62
|
||||
2 : The mean is 110.47925
|
||||
3 : The mean is 132.20016666666666
|
||||
4 : The mean is 154.817125
|
||||
5 : The mean is 178.7297
|
||||
```
|
||||
|
||||
Looking at this test, it is clear that stopwords is better than check 1000 words for speed, but the accuracy is a little bit slower. Stop words is incredibly faster than check 1k words, but on a smaller input the stopwords checker breaks.
|
||||
|
||||
Therefore, we should use stopword checker on larger texts, and check 1k words on smaller texts.
|
||||
|
||||
More specifically, stopwords checker for len == 110 has an optimal threshold of 19, whereas check 1k words has an optimal threshold of 68. This means that while stopwords can potentially end earlier and only search the first 19% of the list, check 1k words would search 68% of the list.
|
||||
|
||||
Stopwords has a lower accuracy by 2%, but it is much, much faster and its optimal threshold is greatly reduced.
|
||||
|
||||
So ideally, we would have this algorithm:
|
||||
1. Sentence length less than 110:
|
||||
1. Use check 1k words with threshold of 2%
|
||||
2. Sentence length > 110:
|
||||
1. use Stopwords with threshold of 15
|
||||
3. Sentence length > 150:
|
||||
1. Stopwords threshold increases to 28
|
||||
|
||||
This is the ideal optimal phase 1 algorithm for `brandon` checker.
|
||||
|
||||
# Phase 2
|
||||
|
||||
Phase 2 is the dictionary checker.
|
||||
|
||||
Firstly, we check to find the best thresholds for the dictionary checker.
|
||||
|
||||
```
|
||||
'checker': {1: {'Accuracy': 0.97, 'Threshold': 99},
|
||||
2: {'Accuracy': 0.98, 'Threshold': 98},
|
||||
3: {'Accuracy': 0.965, 'Threshold': 68},
|
||||
4: {'Accuracy': 0.99, 'Threshold': 93},
|
||||
5: {'Accuracy': 0.97, 'Threshold': 92}},
|
||||
```
|
||||
The accuracies are good, but the thresholds are simply too high. We're overfitting!
|
||||
|
||||
To fix this, I thought that because the dictionary contained chars <= 2 such as "a" or "an" it was setting off the completion too much, resulting in a much higher threshold.
|
||||
|
||||
To fix this, I only let the checker consider words that are more then 2 chars.
|
||||
|
||||
This is the result:
|
||||
```
|
||||
'checker': {1: {'Accuracy': 0.965, 'Threshold': 60},
|
||||
2: {'Accuracy': 0.98, 'Threshold': 77},
|
||||
3: {'Accuracy': 0.985, 'Threshold': 67},
|
||||
4: {'Accuracy': 0.985, 'Threshold': 99},
|
||||
5: {'Accuracy': 0.98, 'Threshold': 47}},
|
||||
```
|
||||
The accuracy stayed around the same, but the threshold went down. Although the threshold was still kind of high. 99% threshold for 4? I restricted the threshold to 75% and:
|
||||
|
||||
```
|
||||
'checker': {1: {'Accuracy': 0.945, 'Threshold': 66},
|
||||
2: {'accuracy': 0.975, 'threshold': 69},
|
||||
3: {'accuracy': 0.98, 'threshold': 71},
|
||||
4: {'accuracy': 0.99, 'threshold': 65},
|
||||
5: {'accuracy': 0.98, 'threshold': 38}},
|
||||
```
|
||||
|
||||
We can see that the accuracy stayed roughly the same, but the threshold went down a lot. The mean appears to be 66% (from just looking at it).
|
||||
|
||||
However, the accuracy for smaller sentence sizes tanked.
|
||||
|
||||
The highest accuracy we had was with the original one. Words <= 2 chars and no limit on threshold.
|
||||
|
||||
If possible, we want to combine the high accuracy on smaller texts while maintaining the generalisation found in the latter checker results.
|
||||
|
||||
The reason we want a smaller threshold is that due to the chunking procedure, it will be much faster on larger texts. The lower the sentence length the higher the threshold is allowed to be.
|
||||
|
||||
For phase 2, we are not concerned with speed. We are however concerned with accuracy.
|
||||
|
||||
I believe that threshold > 90% is overfitting. I cannot reasonably see this successfully working within Ciphey itself.
|
||||
|
||||
My next test will be max threshold of 100% with no chars less than or equal to 1.
|
||||
|
||||
```
|
||||
'checker': {1: {'Accuracy': 0.97, 'Threshold': 93},
|
||||
2: {'Accuracy': 0.975, 'Threshold': 82},
|
||||
3: {'Accuracy': 0.97, 'Threshold': 96},
|
||||
4: {'Accuracy': 0.965, 'Threshold': 31},
|
||||
5: {'Accuracy': 0.965, 'Threshold': 74}},
|
||||
```
|
||||
the accuracy is 97% with a threshold of 93. This is much higher than the latter test. I think for lower texts, since we don't care about speed, we should use a higher threshold. This test was ran 20,000 times. I will run the tests once much to see if the threshold significantly changes.
|
||||
|
||||
The test results were:
|
||||
```
|
||||
'checker': {1: {'Accuracy': 0.96, 'Threshold': 92},
|
||||
2: {'Accuracy': 0.97, 'Threshold': 95},
|
||||
3: {'Accuracy': 0.965, 'Threshold': 81},
|
||||
4: {'Accuracy': 0.96, 'Threshold': 38},
|
||||
5: {'Accuracy': 0.975, 'Threshold': 52}},
|
||||
```
|
||||
|
||||
One last test. No threshold limit with no char limit.
|
||||
```
|
||||
'checker': {1: {'Accuracy': 0.98, 'Threshold': 92},
|
||||
2: {'Accuracy': 0.99, 'Threshold': 91},
|
||||
3: {'Accuracy': 0.97, 'Threshold': 83},
|
||||
4: {'Accuracy': 0.97, 'Threshold': 71},
|
||||
5: {'Accuracy': 0.975, 'Threshold': 74}},
|
||||
```
|
||||
|
||||
In total, we want these ones:
|
||||
|
||||
```
|
||||
{1: {'Accuracy': 0.98, 'Threshold': 92},
|
||||
2: {'accuracy': 0.975, 'threshold': 69},
|
||||
3: {'accuracy': 0.98, 'threshold': 71},
|
||||
4: {'accuracy': 0.99, 'threshold': 65},
|
||||
5: {'accuracy': 0.98, 'threshold': 38}},
|
||||
^^ with 75% threshold limit
|
||||
```
|
||||
|
||||
Lower thresholds, accuracies look good too.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
|
@ -1,4 +1,4 @@
|
|||
from ciphey.LanguageChecker.brandon import Brandon
|
||||
from ciphey.basemods.Checkers.brandon import Brandon
|
||||
config = dict()
|
||||
lc = config["checker"](config)
|
||||
import unittest
|
||||
|
|
|
@ -20,7 +20,7 @@ class encipher:
|
|||
def __init__(self): # pragma: no cover
|
||||
"""Inits the encipher object """
|
||||
self.text = self.read_text()
|
||||
self.MAX_SENTENCE_LENGTH = 20
|
||||
self.MAX_SENTENCE_LENGTH = 5
|
||||
# ntlk.download("punkt")
|
||||
self.crypto = encipher_crypto()
|
||||
|
||||
|
@ -30,13 +30,13 @@ class encipher:
|
|||
splits = nltk.tokenize.sent_tokenize(x)
|
||||
return splits
|
||||
|
||||
def getRandomSentence(self): # pragma: no cover
|
||||
def getRandomSentence(self, size): # pragma: no cover
|
||||
return TreebankWordDetokenizer().detokenize(
|
||||
random.sample(self.text, random.randint(1, self.MAX_SENTENCE_LENGTH))
|
||||
random.sample(self.text, random.randint(1, size))
|
||||
)
|
||||
|
||||
def getRandomEncryptedSentence(self): # pragma: no cover
|
||||
sents = self.getRandomSentence()
|
||||
def getRandomEncryptedSentence(self, size): # pragma: no cover
|
||||
sents = self.getRandomSentence(size)
|
||||
|
||||
sentsEncrypted = self.crypto.randomEncrypt(sents)
|
||||
return {"PlainText Sentences": sents, "Encrypted Texts": sentsEncrypted}
|
||||
|
|
|
@ -11,113 +11,113 @@ class testIntegration(unittest.TestCase):
|
|||
"""
|
||||
|
||||
def test_basics(self):
|
||||
lc = LanguageChecker.LanguageChecker()
|
||||
result = lc.checkLanguage(
|
||||
lc = LanguageChecker.Checker()
|
||||
result = lc.check(
|
||||
"Hello my name is new and this is an example of some english text"
|
||||
)
|
||||
self.assertEqual(result, True)
|
||||
|
||||
def test_basics_german(self):
|
||||
lc = LanguageChecker.LanguageChecker()
|
||||
result = lc.checkLanguage("hallo keine lieben leute nach")
|
||||
lc = LanguageChecker.Checker()
|
||||
result = lc.check("hallo keine lieben leute nach")
|
||||
self.assertEqual(result, False)
|
||||
|
||||
def test_basics_quickbrownfox(self):
|
||||
"""
|
||||
This returns true becaue by default chi squared returns true so long as it's less than 10 items it's processed
|
||||
"""
|
||||
lc = LanguageChecker.LanguageChecker()
|
||||
result = lc.checkLanguage("The quick brown fox jumped over the lazy dog")
|
||||
lc = LanguageChecker.Checker()
|
||||
result = lc.check("The quick brown fox jumped over the lazy dog")
|
||||
self.assertEqual(result, True)
|
||||
|
||||
def test_basics_quickbrownfox(self):
|
||||
"""
|
||||
This returns true becaue by default chi squared returns true so long as it's less than 10 items it's processed
|
||||
"""
|
||||
lc = LanguageChecker.LanguageChecker()
|
||||
result = lc.checkLanguage("The quick brown fox jumped over the lazy dog")
|
||||
lc = LanguageChecker.Checker()
|
||||
result = lc.check("The quick brown fox jumped over the lazy dog")
|
||||
self.assertEqual(result, True)
|
||||
|
||||
def test_chi_maxima_true(self):
|
||||
"""
|
||||
This returns false because s.d is not over 1 as all inputs are English
|
||||
"""
|
||||
lc = LanguageChecker.LanguageChecker()
|
||||
result = lc.checkLanguage("sa dew fea dxza dcsa da fsa d")
|
||||
result = lc.checkLanguage("df grtsf a sgrds fgserwqd")
|
||||
result = lc.checkLanguage("fd sa fe safsda srmad sadsa d")
|
||||
result = lc.checkLanguage(" oihn giuhh7hguygiuhuyguyuyg ig iug iugiugiug")
|
||||
result = lc.checkLanguage(
|
||||
lc = LanguageChecker.Checker()
|
||||
result = lc.check("sa dew fea dxza dcsa da fsa d")
|
||||
result = lc.check("df grtsf a sgrds fgserwqd")
|
||||
result = lc.check("fd sa fe safsda srmad sadsa d")
|
||||
result = lc.check(" oihn giuhh7hguygiuhuyguyuyg ig iug iugiugiug")
|
||||
result = lc.check(
|
||||
"oiuhiuhiuhoiuh7 a opokp[poj uyg ytdra4efriug oih kjnbjhb jgv"
|
||||
)
|
||||
result = lc.checkLanguage("r jabbi tb y jyg ygiuygytff u0")
|
||||
result = lc.checkLanguage("ld oiu oj uh t t er s d gf hg g h h")
|
||||
result = lc.checkLanguage(
|
||||
result = lc.check("r jabbi tb y jyg ygiuygytff u0")
|
||||
result = lc.check("ld oiu oj uh t t er s d gf hg g h h")
|
||||
result = lc.check(
|
||||
"posa idijdsa ije i vi ijerijofdj ouhsaf oiuhas oihd "
|
||||
)
|
||||
result = lc.checkLanguage(
|
||||
result = lc.check(
|
||||
"Likwew e wqrew rwr safdsa dawe r3d hg jyrt dwqefp ;g;;' [ [sadqa ]]."
|
||||
)
|
||||
result = lc.checkLanguage("Her hyt e jytgv urjfdghbsfd c ")
|
||||
result = lc.checkLanguage("CASSAE X T H WAEASD AFDG TERFADDSFD")
|
||||
result = lc.checkLanguage("das te y we fdsbfsd fe a ")
|
||||
result = lc.checkLanguage("d pa pdpsa ofoiaoew ifdisa ikrkasd s")
|
||||
result = lc.checkLanguage(
|
||||
result = lc.check("Her hyt e jytgv urjfdghbsfd c ")
|
||||
result = lc.check("CASSAE X T H WAEASD AFDG TERFADDSFD")
|
||||
result = lc.check("das te y we fdsbfsd fe a ")
|
||||
result = lc.check("d pa pdpsa ofoiaoew ifdisa ikrkasd s")
|
||||
result = lc.check(
|
||||
"My friend is a really nice people who really enjoys swimming, dancing, kicking, English."
|
||||
)
|
||||
self.assertEqual(result, True)
|
||||
|
||||
def test_integration_unusual_one(self):
|
||||
lc = LanguageChecker.LanguageChecker()
|
||||
result = lc.checkLanguage("HELLO MY NAME IS BRANDON AND I LIKE DOLLAR")
|
||||
lc = LanguageChecker.Checker()
|
||||
result = lc.check("HELLO MY NAME IS BRANDON AND I LIKE DOLLAR")
|
||||
self.assertEqual(result, True)
|
||||
|
||||
def test_integration_unusual_two(self):
|
||||
lc = LanguageChecker.LanguageChecker()
|
||||
result = lc.checkLanguage("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa")
|
||||
lc = LanguageChecker.Checker()
|
||||
result = lc.check("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa")
|
||||
self.assertEqual(result, False)
|
||||
|
||||
def test_integration_unusual_three(self):
|
||||
lc = LanguageChecker.LanguageChecker()
|
||||
result = lc.checkLanguage("password")
|
||||
lc = LanguageChecker.Checker()
|
||||
result = lc.check("password")
|
||||
self.assertEqual(result, True)
|
||||
|
||||
def test_integration_unusual_three(self):
|
||||
lc = LanguageChecker.LanguageChecker()
|
||||
result = lc.checkLanguage("")
|
||||
lc = LanguageChecker.Checker()
|
||||
result = lc.check("")
|
||||
self.assertEqual(result, False)
|
||||
|
||||
def test_integration_unusual_four(self):
|
||||
lc = LanguageChecker.LanguageChecker()
|
||||
result = lc.checkLanguage(".")
|
||||
lc = LanguageChecker.Checker()
|
||||
result = lc.check(".")
|
||||
self.assertEqual(result, False)
|
||||
|
||||
def test_integration_unusual_five(self):
|
||||
lc = LanguageChecker.LanguageChecker()
|
||||
result = lc.checkLanguage("#")
|
||||
lc = LanguageChecker.Checker()
|
||||
result = lc.check("#")
|
||||
self.assertEqual(result, False)
|
||||
|
||||
def test_integration_unusual_7(self):
|
||||
lc = LanguageChecker.LanguageChecker()
|
||||
result = lc.checkLanguage(
|
||||
lc = LanguageChecker.Checker()
|
||||
result = lc.check(
|
||||
"999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999"
|
||||
)
|
||||
self.assertEqual(result, False)
|
||||
|
||||
def test_integration_unusual_7(self):
|
||||
lc = LanguageChecker.LanguageChecker()
|
||||
result = lc.checkLanguage("")
|
||||
lc = LanguageChecker.Checker()
|
||||
result = lc.check("")
|
||||
self.assertEqual(result, False)
|
||||
|
||||
def test_integration_addition(self):
|
||||
"""
|
||||
Makes sure you can add 2 lanuggae objecs together
|
||||
"""
|
||||
lc = LanguageChecker.LanguageChecker()
|
||||
result = lc.checkLanguage("hello my darling")
|
||||
lc = LanguageChecker.Checker()
|
||||
result = lc.check("hello my darling")
|
||||
|
||||
lc2 = LanguageChecker.LanguageChecker()
|
||||
result = lc.checkLanguage("sad as dasr as s")
|
||||
lc2 = LanguageChecker.Checker()
|
||||
result = lc.check("sad as dasr as s")
|
||||
|
||||
temp = lc.getChiScore()
|
||||
temp2 = lc2.getChiScore()
|
||||
|
@ -132,6 +132,6 @@ class testIntegration(unittest.TestCase):
|
|||
Bug is that chi squared does not score this as True
|
||||
"""
|
||||
text = """Charles Babbage, FRS (26 December 1791 - 18 October 1871) was an English mathematician, philosopher, inventor and mechanical engineer who originated the concept of a programmable computer. Considered a "father of the computer", Babbage is credited with inventing the first mechanical computer that eventually led to more complex designs. Parts of his uncompleted mechanisms are on display in the London Science Museum. In 1991, a perfectly functioning difference engine was constructed from Babbage's original plans. Built to tolerances achievable in the 19th century, the success of the finished engine indicated that Babbage's machine would have worked. Nine years later, the Science Museum completed the printer Babbage had designed for the difference engine."""
|
||||
lc = LanguageChecker.LanguageChecker()
|
||||
result = lc.checkLanguage(text)
|
||||
lc = LanguageChecker.Checker()
|
||||
result = lc.check(text)
|
||||
self.assertEqual(result, True)
|
||||
|
|
|
@ -0,0 +1,417 @@
|
|||
"""
|
||||
TL;DR
|
||||
|
||||
Tested over 20,000 times
|
||||
|
||||
Maximum sentence size is 15 sentences
|
||||
1/2 chance of getting 'gibberish' (encrypted text)
|
||||
1/2 chance of getting English text
|
||||
|
||||
Each test is timed using Time module.
|
||||
The accuracy is calculated as to how many true positives we get over the entire run
|
||||
|
||||
"""
|
||||
|
||||
|
||||
import spacy
|
||||
import random
|
||||
import time
|
||||
from statistics import mean
|
||||
import ciphey
|
||||
import enciphey
|
||||
from alive_progress import alive_bar
|
||||
from spacy.lang.en.stop_words import STOP_WORDS
|
||||
import cipheydists
|
||||
import cipheycore
|
||||
import pprint
|
||||
from math import ceil
|
||||
|
||||
|
||||
class tester:
|
||||
def __init__(self):
|
||||
|
||||
self.nlp = spacy.load("en_core_web_sm")
|
||||
|
||||
self.f = open("hansard.txt", encoding="ISO-8859-1").read()
|
||||
self.f = self.f.split(".")
|
||||
|
||||
# self.analysis = cipheycore.start_analysis()
|
||||
# for word in self.f:
|
||||
# cipheycore.continue_analysis(self.analysis, word)
|
||||
# cipheycore.finish_analysis(self.analysis)
|
||||
|
||||
self.enciph = enciphey.encipher()
|
||||
|
||||
# all stopwords
|
||||
self.all_stopwords = set(self.nlp.Defaults.stop_words)
|
||||
self.top1000Words = cipheydists.get_list("english1000")
|
||||
self.wordlist = cipheydists.get_list("english")
|
||||
self.endings = set(
|
||||
[
|
||||
"al",
|
||||
"y",
|
||||
"sion",
|
||||
"tion",
|
||||
"ize",
|
||||
"ic",
|
||||
"ious",
|
||||
"ness",
|
||||
"ment",
|
||||
"ed",
|
||||
"ify",
|
||||
"ence",
|
||||
"fy",
|
||||
"less",
|
||||
"ance",
|
||||
"ship",
|
||||
"ate",
|
||||
"dom",
|
||||
"ist",
|
||||
"ish",
|
||||
"ive",
|
||||
"en",
|
||||
"ical",
|
||||
"ful",
|
||||
"ible",
|
||||
"ise",
|
||||
"ing",
|
||||
"ity",
|
||||
"ism",
|
||||
"able",
|
||||
"ty",
|
||||
"er",
|
||||
"or",
|
||||
"esque",
|
||||
"acy",
|
||||
"ous",
|
||||
]
|
||||
)
|
||||
self.endings_3_letters = list(filter(lambda x: len(x) > 3, self.endings))
|
||||
self.best_thresholds = {
|
||||
"word endings": {
|
||||
1: {"Threshold": 0, "Accuracy": 0},
|
||||
2: {"Threshold": 0, "Accuracy": 0},
|
||||
3: {"Threshold": 0, "Accuracy": 0},
|
||||
4: {"Threshold": 0, "Accuracy": 0},
|
||||
5: {"Threshold": 0, "Accuracy": 0},
|
||||
},
|
||||
"word endngs with just 3 chars": {
|
||||
1: {"Threshold": 0, "Accuracy": 0},
|
||||
2: {"Threshold": 0, "Accuracy": 0},
|
||||
3: {"Threshold": 0, "Accuracy": 0},
|
||||
4: {"Threshold": 0, "Accuracy": 0},
|
||||
5: {"Threshold": 0, "Accuracy": 0},
|
||||
},
|
||||
"stop words": {
|
||||
1: {"Threshold": 0, "Accuracy": 0},
|
||||
2: {"Threshold": 0, "Accuracy": 0},
|
||||
3: {"Threshold": 0, "Accuracy": 0},
|
||||
4: {"Threshold": 0, "Accuracy": 0},
|
||||
5: {"Threshold": 0, "Accuracy": 0},
|
||||
},
|
||||
"check 1000 words": {
|
||||
1: {"Threshold": 0, "Accuracy": 0},
|
||||
2: {"Threshold": 0, "Accuracy": 0},
|
||||
3: {"Threshold": 0, "Accuracy": 0},
|
||||
4: {"Threshold": 0, "Accuracy": 0},
|
||||
5: {"Threshold": 0, "Accuracy": 0},
|
||||
},
|
||||
"checker": {
|
||||
1: {"Threshold": 0, "Accuracy": 0},
|
||||
2: {"Threshold": 0, "Accuracy": 0},
|
||||
3: {"Threshold": 0, "Accuracy": 0},
|
||||
4: {"Threshold": 0, "Accuracy": 0},
|
||||
5: {"Threshold": 0, "Accuracy": 0},
|
||||
},
|
||||
}
|
||||
|
||||
# text = "hello my name is Bee and I really like flowers"
|
||||
# def checker(self, text: str, threshold: float, text_length: int) -> bool:
|
||||
# x = self.checker(text=text, threshold=0.55, text_length=len(text))
|
||||
|
||||
def lem(self, text, thresold):
|
||||
sentences = self.nlp(text)
|
||||
return set([word.lemma_ for word in sentences])
|
||||
|
||||
def stop(self, text, threshold):
|
||||
for word in text:
|
||||
if word in self.all_stopwords:
|
||||
return True
|
||||
else:
|
||||
return False
|
||||
# x = [word for word in text if not word in self.all_stopwords]
|
||||
# return True if len(x) < len(text) else False
|
||||
|
||||
def check1000Words(self, text, threshold):
|
||||
"""Checks to see if word is in the list of 1000 words
|
||||
the 1000words is a dict, so lookup is O(1)
|
||||
Args:
|
||||
text -> The text we use to text (a word)
|
||||
Returns:
|
||||
bool -> whether it's in the dict or not.
|
||||
"""
|
||||
# If we have no wordlist, then we can't reject the candidate on this basis
|
||||
|
||||
if text is None:
|
||||
return False
|
||||
# If any of the top 1000 words in the text appear
|
||||
# return true
|
||||
for word in text:
|
||||
# I was debating using any() here, but I think they're the
|
||||
# same speed so it doesn't really matter too much
|
||||
if word in self.top1000Words:
|
||||
return True
|
||||
return False
|
||||
|
||||
def get_random_sentence(self, size):
|
||||
# if random.randint(0, 1) == 0:
|
||||
# x = None
|
||||
# while x is None:
|
||||
# x = (True, " ".join(random.sample(self.f, k=random.randint(1, size))))
|
||||
# return x
|
||||
# else:
|
||||
# x = None
|
||||
# while x is None:
|
||||
# x = self.enciph.getRandomEncryptedSentence(size)
|
||||
# x = x["Encrypted Texts"]["EncryptedText"]
|
||||
# return (False, x)
|
||||
x = (True, " ".join(random.sample(self.f, k=random.randint(1, size))))
|
||||
return x
|
||||
|
||||
def get_words(self, text):
|
||||
doc = self.nlp(text)
|
||||
toReturn = []
|
||||
for token in doc:
|
||||
toReturn.append((token.text).lower())
|
||||
return toReturn
|
||||
|
||||
def word_endings(self, text, thresold):
|
||||
total = len(text)
|
||||
if total == 0:
|
||||
return False
|
||||
positive = 0
|
||||
# as soon as we hit 25%, we exit and return True
|
||||
for word in text:
|
||||
for word2 in self.endings:
|
||||
if word.endswith(word2):
|
||||
positive += 1
|
||||
# if total / positive >= 0.25:
|
||||
# return True
|
||||
# return False
|
||||
if positive == 0:
|
||||
return False
|
||||
return True if positive / total > thresold else False
|
||||
|
||||
def word_endings_3(self, text, threshold):
|
||||
"""Word endings that only end in 3 chars, may be faster to compute"""
|
||||
positive = 0
|
||||
total = len(text)
|
||||
if total == 0:
|
||||
return False
|
||||
for word in text:
|
||||
if word[::-3] in self.endings_3_letters:
|
||||
positive += 1
|
||||
if positive != 0:
|
||||
return True if total / positive > threshold else False
|
||||
else:
|
||||
return False
|
||||
|
||||
# Now to time it and take measurements
|
||||
|
||||
def perform(self, function, sent_size, threshold):
|
||||
threshold = threshold / 100
|
||||
# calculate accuracy
|
||||
total = 0
|
||||
true_positive_returns = 0
|
||||
true_negative_returns = 0
|
||||
false_positive_returns = 0
|
||||
false_negatives_returns = 0
|
||||
|
||||
# calculate aveager time
|
||||
time_list = []
|
||||
|
||||
# average sentance size
|
||||
sent_size_list = []
|
||||
test_range = 200
|
||||
for i in range(0, test_range):
|
||||
sent = self.get_random_sentence(sent_size)
|
||||
text = sent[1]
|
||||
truthy = sent[0]
|
||||
sent_size_list.append(len(text))
|
||||
|
||||
# should be length of chars
|
||||
text = self.get_words(text)
|
||||
old = len(text)
|
||||
|
||||
# timing the function
|
||||
# def checker(self, text: str, threshold: float, text_length: int, var: set) -> bool:
|
||||
tic = time.perf_counter()
|
||||
result = function(text=text, threshold=threshold, text_length=old)
|
||||
tok = time.perf_counter()
|
||||
# new = len(result)
|
||||
# print(
|
||||
# f"The old text is \n {''.join(text)}\n and the new text is \n {''.join(result)} \n\n"
|
||||
# )
|
||||
|
||||
# result = new < old
|
||||
|
||||
# checking for accuracy
|
||||
# new = len(new)
|
||||
# the and here means we only count True Positives
|
||||
# result = new < old
|
||||
if result and truthy:
|
||||
true_positive_returns += 1
|
||||
elif result:
|
||||
false_positive_returns += 1
|
||||
elif not result and truthy:
|
||||
false_negatives_returns += 1
|
||||
elif not result:
|
||||
true_negative_returns += 1
|
||||
else:
|
||||
print("ERROR")
|
||||
|
||||
total += 1
|
||||
|
||||
# appending the time
|
||||
t = tok - tic
|
||||
time_list.append(t)
|
||||
|
||||
print(
|
||||
f"The accuracy is {str((true_positive_returns+true_negative_returns) / total)} \n and the time it took is {str(mean(time_list))}. \n The average string size was {str(mean(sent_size_list))}"
|
||||
)
|
||||
print(
|
||||
f"""
|
||||
Positive Negative
|
||||
Positive {true_positive_returns} {false_positive_returns}
|
||||
Negative {false_negatives_returns} {true_negative_returns}
|
||||
|
||||
"""
|
||||
)
|
||||
return {
|
||||
"Name": function,
|
||||
"Threshold": threshold,
|
||||
"Accuracy": (true_positive_returns + true_negative_returns) / total,
|
||||
"Average_time": mean(time_list),
|
||||
"Average_string_len": mean(sent_size_list),
|
||||
"Sentence length": sent_size,
|
||||
"confusion_matrix": [
|
||||
[true_positive_returns, false_positive_returns],
|
||||
[false_negatives_returns, true_negative_returns],
|
||||
],
|
||||
}
|
||||
|
||||
def perform_3_sent_sizes(self, threshold):
|
||||
"""
|
||||
Gives us the average accuracy and time etc
|
||||
"""
|
||||
# funcs = [obj.checker, obj.stop, obj.check1000Words]
|
||||
funcs = [obj.checker]
|
||||
# funcs = [obj.word_endings]
|
||||
names = [
|
||||
"checker",
|
||||
# "stop words",
|
||||
# "check 1000 words",
|
||||
]
|
||||
# names = ["checker"]
|
||||
sent_sizes = [1, 2, 3, 4, 5]
|
||||
x = {
|
||||
# "stop words": {1: None, 2: None, 3: None, 4: None, 5: None, 20: None},
|
||||
# "check 1000 words": {1: None, 2: None, 3: None, 4: None, 5: None, 20: None},
|
||||
"checker": {1: None, 2: None, 3: None, 4: None, 5: None, 20: None},
|
||||
}
|
||||
for i in range(0, len(funcs)):
|
||||
func = funcs[i]
|
||||
for y in sent_sizes:
|
||||
# print("Hello this runsss")
|
||||
x[names[i]][y] = self.perform(func, y, threshold)
|
||||
return x
|
||||
|
||||
def perform_best_percentages(self):
|
||||
"""
|
||||
Tells us the optimal percentage thresholds
|
||||
"""
|
||||
"""
|
||||
TODO I need to record thresholds for each length of text
|
||||
"""
|
||||
|
||||
# "word endings with just 3 chars": {
|
||||
# "Sentence Size": {"Threshold": 0, "Accuracy": 0}
|
||||
# },
|
||||
# "stop words": {"Sentence Size": {"Threshold": 0, "Accuracy": 0}},
|
||||
# "check 1000 words": {"Sentence Size": {"Threshold": 0, "Accuracy": 0}},
|
||||
# }
|
||||
|
||||
items = range(100)
|
||||
with alive_bar(len(items)) as bar:
|
||||
for i in range(1, 101):
|
||||
x = self.perform_3_sent_sizes(threshold=i)
|
||||
pprint.pprint(x)
|
||||
for key, value in x.items():
|
||||
# getting max keyLs
|
||||
for y in [1, 2, 3, 4, 5]:
|
||||
pprint.pprint(x[key])
|
||||
# size = x[key][y]
|
||||
size = y
|
||||
# print(f"**** Size is {size}")
|
||||
temp1 = x[key][y]["Accuracy"]
|
||||
# print(f"Accuracy is {temp1}")
|
||||
temp2 = self.best_thresholds[key][size]["Accuracy"]
|
||||
if temp1 > temp2:
|
||||
temp2 = temp1
|
||||
# print(f"Self best is {self.best_thresholds[key][size]}")
|
||||
self.best_thresholds[key][size]["Threshold"] = i
|
||||
self.best_thresholds[key][size]["Accuracy"] = temp1
|
||||
pprint.pprint(x)
|
||||
bar()
|
||||
pprint.pprint(self.best_thresholds)
|
||||
|
||||
def calculate_average_sentence_size(self):
|
||||
sent_sizes = [1, 2, 3, 4, 5]
|
||||
lengths = []
|
||||
for x in sent_sizes:
|
||||
for i in range(0, 2000):
|
||||
y = self.get_random_sentence(x)
|
||||
lengths.append(len(y[1]))
|
||||
print(f"{x} : The mean is {mean(lengths)}")
|
||||
|
||||
def checker(self, text: str, threshold: float, text_length: int) -> bool:
|
||||
"""Given text determine if it passes checker
|
||||
|
||||
The checker uses the vairable passed to it. I.E. Stopwords list, 1k words, dictionary
|
||||
|
||||
Args:
|
||||
text -> The text to check
|
||||
threshold -> at what point do we return True? The percentage of text that is in var before we return True
|
||||
text_length -> the length of the text
|
||||
var -> the variable we are checking against. Stopwords list, 1k words list, dictionray list.
|
||||
Returns:
|
||||
boolean -> True for it passes the test, False for it fails the test."""
|
||||
|
||||
percent = ceil(text_length * threshold)
|
||||
meet_threshold = 0
|
||||
location = 0
|
||||
end = percent
|
||||
|
||||
while location <= text_length:
|
||||
# chunks the text, so only gets THRESHOLD chunks of text at a time
|
||||
to_analyse = text[location:end]
|
||||
for word in to_analyse:
|
||||
# if len(word) <= 1:
|
||||
# continue
|
||||
# if word is a stopword, + 1 to the counter
|
||||
if word in self.wordlist:
|
||||
meet_threshold += 1
|
||||
if meet_threshold / text_length >= threshold:
|
||||
# if we meet the threshold, return True
|
||||
# otherwise, go over again until we do
|
||||
# We do this in the for loop because if we're at 24% and THRESHOLD is 25
|
||||
# we don't want to wait THRESHOLD to return true, we want to return True ASAP
|
||||
return True
|
||||
location += 1
|
||||
return False
|
||||
|
||||
|
||||
obj = tester()
|
||||
# X = obj.perform_3_sent_sizes(50)
|
||||
# x = obj.perform_best_percentages()
|
||||
x = obj.calculate_average_sentence_size()
|
|
@ -1,12 +1,10 @@
|
|||
import sys
|
||||
|
||||
from ciphey.LanguageChecker.brandon import Brandon
|
||||
from ciphey.basemods.Checkers.brandon import Brandon
|
||||
from ciphey.Decryptor.Encoding.encodingParent import EncodingParent
|
||||
from ciphey.__main__ import make_default_config
|
||||
import unittest
|
||||
from loguru import logger
|
||||
import cipheydists
|
||||
|
||||
|
||||
config = make_default_config("")
|
||||
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
from ciphey.neuralNetworkMod.nn import NeuralNetwork
|
||||
from ciphey.basemods.Decoder.nn import NeuralNetwork
|
||||
import numpy
|
||||
|
||||
import unittest
|
||||
|
|
|
@ -0,0 +1,16 @@
|
|||
Id: Ciphey.Ciphey
|
||||
Publisher: Ciphey
|
||||
Name: Ciphey
|
||||
Version: 5
|
||||
AppMoniker: Ciphey
|
||||
MinOSVersion: 10.0.0.0
|
||||
Description: Automated Decryption Tool
|
||||
Homepage: https://www.github.com/ciphey/ciphey
|
||||
License: MIT
|
||||
LicenseUrl: https://opensource.org/licenses/MIT
|
||||
InstallerType: exe
|
||||
Installers:
|
||||
- Arch: x84
|
||||
Url: https://statics.teams.cdn.office.net/production-windows-x64/1.3.00.4461/Teams_windows_x64.exe
|
||||
Sha256: 712f139d71e56bfb306e4a7b739b0e1109abb662dfa164192a5cfd6adb24a4e1
|
||||
ManifestVersion: 0.1.0
|