5.0.0 first release (#154)

* Started working on brandon checker 2.0, added more docs

* Starting on cipher iface from #104

* Starting on cipher iface from #104

* More registry work

* More registry work

* Nicer registry

* Nicer registry

* More iface

* GCSE geography WAS useful

* Refactored logging, and got module loading to work

* added skeleton of object iface

* Quick patch of registry

* Fixed docs

* Holy sheet that's cool!

* Fixing old python

* Logging and testing refactors

* Wow I hate this stupid reference bs

* Can now do iterated encodings

* Morse optimisation

* More migration work of modules into the refactoring, and added list regex

* Added comment

* Uploaded tests

* Can't get Jupyer to work so I'm doing it in raw Python now

* Started fleshing out the builder

* Updated todo list

* Figured out hansard.txt has spelling errors

will try to apply SpaCy model to automatically fix errors

* updates

* Added new dicts to gitignore

* Started combining the 2 dicts



Former-commit-id: 0733c4951dbff373748f4c0c2c3fb2a4d224fd32

* More cleaning of the dicts



Former-commit-id: a1290e2bb67ce7a8bc9626fbc7e294806b1e6913

* More work on fixing up the combined dict



Former-commit-id: 733cdb1635d5cd3f93f52e6afd10efee0f58b50e

* Text is almost made



Former-commit-id: 5e6b2e8e4b230494f082d8f1c4fe1061b8f31ace

* Adding more comments



Former-commit-id: 62ae46be03b2d4e113e4552a97a7e903591b387d

* Working on an automated test



Former-commit-id: 38a28368efb1ebed7cf7a4e8fe6906a59dddc3be

* More tests



Former-commit-id: 0f677a31397c9a358ff4e5d487e35571319f2877

* Switcing to I3



Former-commit-id: 47d7112655be06b42f8a8c7c2b6fdfc0e561d8a9

* More testing stuff



Former-commit-id: 5179e8cc8a3bf6554c13a1667bea8b449a547205

* Fixing tests



Former-commit-id: 7e5fefd473f21bad4a21d0e87e9905e63d5fb50c

* Turing the file into a proper class



Former-commit-id: 69058f7768e95d20e33c7bf1e08de362d227bb4e

* Working on automating more of the tester



Former-commit-id: c88803c15f1b4ee1dbc5f5a963bc5856a1fcf96f

* Fixed major bug



Former-commit-id: a4affd37f6fc03567f0fa077ac5ebdd04f7ab7ae

* Added targeting

* Adding more tests



Former-commit-id: 8dcc8788fb1d85f45a50f4943551075969a79f82

* Optimised word endings



Former-commit-id: 0a309423d9278745efeb3c71f06d6a2612e0a213

* switching brancghes



Former-commit-id: b334b950e5c48d5582febb929e7503602ab4a020

* harlan pls



Former-commit-id: cea325a58b0405417c7909d10906b5e3626d8582

* Made it better



Former-commit-id: 3dbc9e02e997b87424cd0bf890bf2edfd18ccdf0

* stopped using 20k tests



Former-commit-id: 8b7ac4a5b19d5d6df8717e4f068ed853c35b85fb

* started automating threshold calculations



Former-commit-id: a9626bef1dc9c373122b9471ab3cdfe8f4d77a6c

* Started writing more threshold calculations



Former-commit-id: 2a5b844885d7f912e118eebba4b706a3a7db444d

* Maybe this will work



Former-commit-id: 20ab0efa3143a8be19178ee6ae111f2d68b2378f

* Added important todo



Former-commit-id: 1d14c87bd09ce6c4de8364bdaecd3920b5a46648

* Started building the NATO checker



Former-commit-id: 2c4ac906a7fac7bbe799cde9f289632e68df0a1a

* Updated settings file



Former-commit-id: 776143f1714325faac5d75a6dea41e27affcf677

* Added sent size calculator



Former-commit-id: 71f715b87d21efccd2dddd9ecb85b24e765d0af2

* This works



Former-commit-id: 88a82f12879fd8e102657e4e92eddedb48affe3f

* Bug fixes



Former-commit-id: 0445ea0a60829221aea4f531d47ea7af2719b3f1

* This finally works LMAOOOOO yayyyy



Former-commit-id: edc0371251a5ff305c91bbf07feb898402d49965

* Done tests



Former-commit-id: 9734ee69dfc799287532f16d82d050732b7744f7

* Wrote up more docs



Former-commit-id: 350380e54bb7dae7c0cb4f26c6a6d8f2d499af57

* Started creating new brandon iface



Former-commit-id: 055f031cb1ffb279f268e210b2aaa46d8cf1262f

* Building settings



Former-commit-id: 754a9a5498fd8fac823dcdca28743b0f7ff8345a

* Writing documentation



Former-commit-id: 63f3e18dbdcf2d1296dfe39b83aedfff6d09da11

* Started integrating the settings file



Former-commit-id: 108e165f4fcf8f823c205310fd659321f77f6389

* Docs



Former-commit-id: b4875c8a9cc656576031708e625113357e5212dc

* Moving to tf lite

* More settings file stuff



Former-commit-id: 164c0ea9b47fe7c182665ed5dc9072fa3f543cc6

* Added more docs



Former-commit-id: cdae88f4b718422282f97aef527ece54292279a6

* More settings stuff



Former-commit-id: 85b547134692bfe91af0dac5eaf9f4e639d15174

* More updates



Former-commit-id: ff521d1d86938d62307427f5237966a6c907f3a9

* Added regexFile



Former-commit-id: 75f25febbc969359ef7e08a2b1da8cda13f81423

* Adding more comments to settings



Former-commit-id: 54dad6eb30682b9877ef3bb9e21bd2da790172dc

* Added where argument

User can now easily find appdirs

Former-commit-id: 10d047e1e91f1202c662b2313ca4dc587a0c6475

* Wrote more docs



Former-commit-id: 183f2271a40b9f0a37158ee8a97fc096c13fa24a

* Started writing stopwords checker



Former-commit-id: 95090a42a4c0650a1879ee65405482e3918fcaec

* Stopwords theoretically works and is efficient



Former-commit-id: 87150256c5243abddf86e27bad3fd9b4752abbf0

* Changed 1k words



Former-commit-id: b7316f0b2eb0e57becbe61406cb0b0a1a1c7cade

* Refactors



Former-commit-id: ef3049b08a2a34e34f3e45dca244064e1308cc51

* Updates to docs



Former-commit-id: 2ab679d44cf566e4e350daaf4b4d8d3940006604

* End of night



Former-commit-id: 5dacd8bfbe91b185bf0d685056ef511d12c47148

* Last push



Former-commit-id: 55a3cb96f26cf19ff4bcceee288075205e4799e5

* Made resources easier to obtain

* Updates

* Writing more Brandon Checker

* Testing for best threshold of dict checker

* Docs

* Documentation + tests

* More brandon checker stuff

* More documentation

* Made Windows Package Manager Manifest

* Changelog additions

* More changelog

* Changelog

* What

* the iface is bad im sorry

* Added

* Fixed ciphey

* Whoopsie

* here u go broken code

* less broken

* It constructs

* Adding windows and mac os testing

* Removed automated tests

* More git actions

* yess

* actions

* Testing actions some more (sorry for the spam)

* Nox

* pls dont hate me

* more CI

* Added Pyinstaller settings file + the entry point needed

* Updated Pyinstaller

* Added packaging stuff

* goodnight

* Readme changes

* More README updates

* Fixing CI

* README

* One more

* Switching branches

* Updated readme to look at the pretty gif

* readme update

* Switching branches

* Added quorum

* Moved detectors to be only for encodings

* Added discord group to contributing.md

* Added discord links

* ciphey-iface, now featuring ActuallyWorksNow (TM) technology

* Now works even better

* Updated readme

* Some more optimisations

* Added greppable back for *cowards*

* README changes

* Canonicalised handleDecodings

* switchinbg branches

* Made merge_dict

* Whoops

* README

* changes

* changes

* pls github

* REASME updates

* READNE

* README updates

* README updates

R

* README updates

* more updates

* pls

* Added reverse

* Added initial csv work

* I love python typing

* README

* BASE64 -> Base64. Complete -> compete

* Update README.md

* Update README.md

* Some more tweaks

* removed important.md

* Update README.md

* I broke main, Harlan said he will take a look tomorrow

* Added Installation guide to README

* Added CI to README

* Reworked the README, I changed a lot :)

* ReadTheDocs -> Docs.Ciphey.Online

* Added new logo

* Increased size of logo

* Uploaded all lock pictures

* Added thanks to designer

* Update README.md

* Update README.md

* I am an idiot

* Fixed registry bs

* Added runs to steps in github action

* Replaced github action with one from GitHub
@

* fixing poetry and nox issues

* fixing nox

* Added clickspinner to indicate running programing

* Added AppDirs command to Main

Now user can find out where ciphey expectst he settings file to be

* Update README.md

* Added important links section

* Update README.md

* Added installation guide to anothjer place

* Update README.md

* Almost finished args

* Update README.md

* That should work now

* begone, tensorflow

* Started PKGBuild & A*

* Updated README with new downloads, more A* work

* Update README.md

* Turned important links into a table

* more a star stuff

* Before remake v1

* A* node selection

* A* finally works

* Added comments to allow for easy modification

* switching branches

* Join brandon in

* psuedocode for astar

* fixing brandon checker

* pls harlan fix#

* Fixed brandon

* That should be trace

* fixed what to use bug

* Various prodding

* Working brandon

* First RC

Co-authored-by: Brandon <brandonskerritt51@gmail.com>
Co-authored-by: Brandon <10378052+brandonskerritt@users.noreply.github.com>
This commit is contained in:
Cyclic3 2020-07-09 17:09:56 +01:00 committed by GitHub
parent 6f5d61056f
commit b77c4eacd1
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
92 changed files with 4424 additions and 1703 deletions

2
.github/config.yml vendored Normal file
View File

@ -0,0 +1,2 @@
todo:
keyword: "@TODO"

38
.github/lock.yml vendored Normal file
View File

@ -0,0 +1,38 @@
# Configuration for Lock Threads - https://github.com/dessant/lock-threads-app
# Number of days of inactivity before a closed issue or pull request is locked
daysUntilLock: 60
# Skip issues and pull requests created before a given timestamp. Timestamp must
# follow ISO 8601 (`YYYY-MM-DD`). Set to `false` to disable
skipCreatedBefore: false
# Issues and pull requests with these labels will be ignored. Set to `[]` to disable
exemptLabels: []
# Label to add before locking, such as `outdated`. Set to `false` to disable
lockLabel: false
# Comment to post before locking. Set to `false` to disable
lockComment: >
This thread has been automatically locked since there has not been
any recent activity after it was closed. Please open a new issue for
related bugs.
# Assign `resolved` as the reason for locking. Set to `false` to disable
setLockReason: true
# Limit to only `issues` or `pulls`
# only: issues
# Optionally, specify configuration settings just for `issues` or `pulls`
# issues:
# exemptLabels:
# - help-wanted
# lockLabel: outdated
# pulls:
# daysUntilLock: 30
# Repository to extend settings from
# _extends: repo

37
.github/workflows/coverage.yml vendored Normal file
View File

@ -0,0 +1,37 @@
name: coverage
on: [push, pull_request]
jobs:
build:
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest, macos-latest, windows-latest]
python-version: [3.6, 3.7, pypy3]
# exclude:
# - os: macos-latest
# python-version: 3.8
# - os: windows-latest
# python-version: 3.6
steps:
- uses: actions/checkout@v2
- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
- name: Tests with Nox
coverage:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v2
with:
python-version: '3.8'
architecture: x64
- run: pip3 install nox==2019.11.9
- run: pip3 install poetry==1.0.5
- run: nox --sessions tests coverage
env:
CODECOV_TOKEN: ${{secrets.CODECOV_TOKEN}}

View File

@ -1,31 +0,0 @@
name: Tests
on: [push, pull_request]
jobs:
old-tests:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ['3.6','3.7']
name: Python ${{ matrix.python-version }}
steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v1
with:
python-version: ${{ matrix.python-version }}
architecture: x64
- run: pip install nox==2019.11.9
- run: pip install poetry==1.0.5
- run: nox
coverage:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v1
with:
python-version: '3.8'
architecture: x64
- run: pip3 install nox==2019.11.9
- run: pip3 install poetry==1.0.5
- run: nox --sessions tests coverage
env:
CODECOV_TOKEN: ${{secrets.CODECOV_TOKEN}}

19
.github/workflows/tests2.yml vendored Normal file
View File

@ -0,0 +1,19 @@
name: Tests
on: [push, pull_request]
jobs:
tests:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ['3.7', '3.8']
os: [ubuntu-latest, macos-latest, windows-latest]
name: Python ${{ matrix.python-version }}
steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v1
with:
python-version: ${{ matrix.python-version }}
architecture: x64
- run: pip install nox==2019.11.9
- run: pip install poetry==1.0.5
- run: nox

25
.gitignore vendored
View File

@ -76,7 +76,7 @@ MANIFEST
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
@ -194,6 +194,29 @@ poetry.lock
# PyCharm
.idea/
ciphey/LanguageChecker/create\?max_size=60&spelling=US&spelling=GBs&max_variant=2&diacritic=both&special=hacker&special=roman-numerals&download=wordlist&encoding=utf-8&format=inline
ciphey/LanguageChecker/create\?max_size=80&spelling=US&spelling=GBs&spelling=GBz&spelling=CA&spelling=AU&max_variant=2&diacritic=both&special=hacker&special=roman-numerals&download=wordlist&encoding=utf-8&format=inline
ciphey/LanguageChecker/aspell.txt
dictionary.txt
aspell.txt
ciphey.spec
ciphey/__main__.spec
__main__.spec
.entry_point.spec/entry_point.spec
BEANOS INVADES THE FORTNITE ITEM SHOP AT 8_00 PM.EXE-uG0WJcr-cuI.f299.mp4.part
run.yml
tests/interface.rst
# Test Generator
test_main_generated.py

View File

@ -1,3 +1,48 @@
Howdy!
So, you're interested in contributing to Ciphey? 🤔
But maybe you're confused as to where to start, or you believe your coding skills aren't "good enough". Well, for the latter - that's ridiculous! We're perfectly okay with "bad code", and even then, if you're reading this document you're probably a great programmer. I mean, newbies don't often learn to contribute to GitHub projects 😉
Here are some ways you can contribute to Ciphey:
* Add a new language 🧏
* Add more encryption methods 📚
* Create more documentation (very important‼ We would be eternally grateful)
* Fix bugs submitted via GitHub issues (we can support you in this 😊)
* Refactor the code base 🥺
If these sound hard, do not worry! This document will walk you through exactly how to achieve any of these. And also.... Your name will be added to Ciphey's contributors list, and we'll be eternally grateful! 🙏
We have a small Discord chat for you to talk to the developers and get some help. Alternatively, write a GitHub issue for your suggestion. If you want to be added to the Discord, DM us or ask us somehow.
[Discord Server](https://discord.gg/KfyRUWw)
# Add a new language 🧏
The default language checker, `brandon`, works with multiple languages. Now, this may sound daunting.
But honestly, all you've got to do is take a dictionary, do a little analysis (we've written code to help you with this), add the dictionaries and analysis to a repo. And then add the option to `settings.yml`.
When I created the German module, I wrote detailed documentation on how I did it. You can read that here.
# Add more encryption methods 📚
# Create more documentation
Documentation is the most important part of Ciphey. No documentation is extreme code debt, and we don't want that.
And trust me when I say, if you contribute to great documentation you will be seen on the same level as code contributors. Documentation is absolutely vital.
There's lots of ways you can add documentation.
* Doc strings in the code
* Improving our current documentation (README, this file, our Read The Docs pages)
* Translating documentation
And much more!
# Fix Bugs
Visit our GitHub issues page to find all the bugs Ciphey has! And squash them, you'll be added to the contributors list ;)
# Refacor the code base
Not all of Ciphey follows PEP8, and some of the code is repeated.
# How to contribute
Ciphey is always in need of more decryption tools!
1. Write a decryption tool (this can include encodings such as Base64 too). Make sure it has a `decrypt` function and is a class.

50
PKGBUILD.proto Normal file
View File

@ -0,0 +1,50 @@
# This is an example PKGBUILD file. Use this as a start to creating your own,
# and remove these comments. For more information, see 'man PKGBUILD'.
# NOTE: Please fill out the license field for your package! If it is unknown,
# then please put 'unknown'.
# Maintainer: Ciphey <brandon@skerritt.blog>
pkgname=Ciphey
pkgver='4.2.1'
pkgrel=1
pkgdesc="Automated Description Tool"
arch=('any')
url="https://github.com/ciphey/ciphey"
license=('MIT')
depends=('python>=3.7')
makedepends=('python>=3.7')
/* checkdepends=() */
/* optdepends=() */
/* provides=() */
/* conflicts=() */
/* replaces=() */
/* backup=('') */
/* options=() */
install=
changelog=
source=("$pkgname-$pkgver.tar.gz"
"$pkgname-$pkgver.patch")
noextract=()
md5sums=()
validpgpkeys=()
prepare() {
cd "$pkgname-$pkgver"
patch -p1 -i "$srcdir/$pkgname-$pkgver.patch"
}
build() {
cd "$pkgname-$pkgver"
./configure --prefix=/usr
make
}
check() {
cd "$pkgname-$pkgver"
make -k check
}
package() {
cd "$pkgname-$pkgver"
make DESTDIR="$pkgdir/" install
}

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.1 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.5 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 8.1 KiB

After

Width:  |  Height:  |  Size: 5.2 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 110 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 23 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.7 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 25 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.0 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 7.3 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 756 B

160
README.md
View File

@ -1,61 +1,126 @@
<p align="center"><a href="https://docs.ciphey.online">Documentation</a> <a href="https://discord.ciphey.online">Discord</a></p>
<p align="center">
➡️
<a href="https://docs.ciphey.online">Documentation</a> |
<a href="https://discord.ciphey.online">Discord</a> |
<a href="https://docs.ciphey.online/en/latest/install.html">Installation Guide</a>
⬅️
<br>
<img src="Pictures_for_README/binoculars.png" alt="Ciphey">
</p>
<p align="center">
<img alt="GitHub commit activity" src="https://img.shields.io/github/commit-activity/m/ciphey/ciphey">
<img src="https://pepy.tech/badge/ciphey">
<img src="https://pepy.tech/badge/ciphey/month">
<a href="https://discord.gg/wM3scnc"><img alt="Discord" src="https://img.shields.io/discord/728245678895136898"></a>
<a href="https://pypi.org/project/ciphey/"><img src="https://img.shields.io/pypi/v/ciphey.svg"></a>
<img src="https://img.shields.io/badge/License-MIT-yellow.svg" alt="Ciphey">
<img src="https://github.com/brandonskerritt/Ciphey/workflows/Python%20application/badge.svg?branch=master" alt="Ciphey">
<img alt="PyPI - Downloads" src="https://img.shields.io/pypi/dm/ciphey">
<img src="https://codecov.io/gh/ciphey/ciphey/branch/master/graph/badge.svg">
<a href="https://ciphey.readthedocs.io/"><img src="https://readthedocs.org/projects/ciphey/badge/"></a>
<img src="https://img.shields.io/badge/all_contributors-1-orange.svg?style=flat-square">
<br>
Fully automated decryption tool using natural language processing & artifical intelligence, along with some common sense.
</p>
<hr>
# What is this?
## [Installation Guide](https://docs.ciphey.online/en/latest/install.html)
| <p align="center"><a href="https://pypi.org/project/ciphey">🐍 Python (Universal) </a></p> | <p align="center"><a href="https://pypi.org/project/ciphey"> Arch </a></p> | <p align="center"><a href="https://pypi.org/project/ciphey"> Windows </a></p> | <p align="center"><a href="https://pypi.org/project/ciphey"> Mac OS </a></p> |
| ----------------------------------------------------------------------------------------- | -------------------------------------------------------------------------- | ----------------------------------------------------------------------------- | ---------------------------------------------------------------------------- |
| <p align="center"><img src="Pictures_for_README/python.png" /></p> | <p align="center"><img src="Pictures_for_README/arch.png" /></p> | <p align="center"><img src="Pictures_for_README/windows.png" /></p> | <p align="center"><img src="Pictures_for_README/apple.png" /></p> |
| `python3 -m pip install ciphey --upgrade` | `yay ciphey` | `winget ciphey` | `brew ciphey` |
| Linux | Mac OS | Windows |
| ----------- | ------ | ----------- |
| ![GitHub Workflow Status](https://img.shields.io/github/workflow/status/ciphey/ciphey/Python%20application?label=Linux) |![GitHub Workflow Status](https://img.shields.io/github/workflow/status/ciphey/ciphey/Python%20application?label=Mac%20OS) | ![GitHub Workflow Status](https://img.shields.io/github/workflow/status/ciphey/ciphey/Python%20application?label=Windows) |
<hr>
# 🤔 What is this?
Ciphey is an automated decryption tool. Input encrypted text, get the decrypted text back.
> "What type of encryption?"
That's the point. You don't know, you just know it's possibly encrypted. Ciphey will figure it out for you.
Ciphey uses a deep neural network with a simple filtration system to approximate what something is encrypted with. And then a custom-built, customisable natural languge processing Language Checker Interface, which can detect when the given text becomes plaintext.
Ciphey can solve most things in 3 seconds or less.
Ciphey can solve most things in about 2 seconds.
<p align="center" href="https://asciinema.org/a/336257">
<img src="Pictures_for_README/index.gif" alt="Ciphey demo">
</p>
# Features
**The technical part.** Ciphey uses a custom built artifical intelligence module (_AuSearch_) with a _Cipher Detection Interface_ to approximate what something is encrypted with. And then a custom-built, customisable natural languge processing _Language Checker Interface_, which can detect when the given text becomes plaintext.
- **20+ encryptions supported** such as hashes, encodings (binary, base64) and normal encryptions like Caesar cipher, Transposition and more. **[For the full list, click here](https://ciphey.readthedocs.io/en/latest/ciphers.html)**
- **Deep neural network for targetting the right decryption** resulting in decryptions taking less than 3 seconds. If Ciphey cannot decrypt the text, Ciphey will use the neural network analysis to give you information on how to decrypt it yourself.
And that's just the tip of the iceberg. For the full technical explanation, check out our [documentation](https://docs.ciphey.online/en/latest/howWork.html).
# ✨ Features
- **20+ encryptions supported** such as hashes, encodings (binary, base64) and normal encryptions like Caesar cipher, Transposition and more. **[For the full list, click here](https://docs.ciphey.online/en/latest/ciphers.html)**
- **Custom Built Artificial Intelligence with Augmented Search (AuSearch) for answering the question "what encryption was used?"** Resulting in decryptions taking less than 3 seconds.
- **Custom built natural language processing module** Ciphey can determine whether something is plaintext or not. It has an incredibly high accuracy, along with being fast.
- **Multi Language Support** at present, only English.
- **Supports hashes & encryptions** Which the alternatives such as CyberChef do not.
- **Multi Language Support** at present, only German & English (with AU, UK, CAN, USA variants).
- **Supports hashes & encryptions** Which the alternatives such as CyberChef Magic do not.
- **[C++ core](https://github.com/Ciphey/CipheyCore)** Blazingly fast.
# 🔭 Ciphey vs CyberChef
# Getting Started
## Installation
### Pip
```python3 -m pip install -U ciphey```
## 🔁 Base64 Encoded 64 times
The -U is needed, as sometimes PyPi gets stuck on an older version.
<table>
<tr>
<th>Name</th>
<th>⚡ Ciphey ⚡ </th>
<th>🐢 CyberChef 🐢</th>
</tr>
<tr>
<th>Gif</th>
<td><img src="Pictures_for_README/ciphey_vs_cyberchef.gif" alt="The guy she tells you not to worry about"></td>
<td><img src="Pictures_for_README/not_dying.gif" alt="You"></td>
</tr>
<tr>
<th>Time</th>
<td>4 seconds</td>
<td>6 seconds</td>
</tr>
<tr>
<th>Setup</th>
<td><ul><li>Set the regex param to "{"</li></ul></td>
<td><ul><li>Set the regex param to "{"</li><li>You need to know how many times to recurse</li><li>You need to know it's Base64 all the way down</li><li>You need to load CyberChef (it's a bloated JS app)</li><li>Know enough about CyberChef to create this pipeline</li><li>Invert the match</li></ul></td>
</tr>
</table>
```ciphey -t "encrypted text here"```
To run ciphey.
### Cloning from GitHub
<sub><b>Note</b> The gifs may load at different times, so one may appear significantly faster than another.</sub><br>
<sub><b>A note on magic </b>CyberChef's most similar feature to Ciphey is Magic. Magic fails instantly on this input and crashes. The only way we could force CyberChef to compete was to manually define it.</sub>
```
git clone https://github.com/Ciphey/Ciphey
cd Ciphey
python3 -m ciphey -t "encrypted text here"
```
### Running Ciphey
We also tested CyberChef and Ciphey with a **6gb file**. Ciphey cracked it in **5 minutes and 54 seconds**. CyberChef crashed before it even started.
## 📊 Ciphey vs Katana vs CyberChef Magic
| **Name** | ⚡ Ciphey ⚡ | 🤡 Katana 🤡 | 🐢 CyberChef Magic 🐢 |
| ------------------------------------------ | ---------- | ---------- | ------------------- |
| Advanced Language Checker | ✅ | ❌ | ✅ |
| Supports Encryptions | ✅ | ✅ | ❌ |
| Releases named after Dystopian themes 🌃 | ✅ | ❌ | ❌ |
| Supports hashes | ✅ | ✅ | ❌ |
| Easy to set up | ✅ | ❌ | ✅ |
| Can guess what something is encrypted with | ✅ | ❌ | ❌ |
| Created for hackers by hackers | ✅ | ✅ | ❌ |
# 🎬 Getting Started
If you're having trouble with installing Ciphey, [read this.](https://docs.ciphey.online/en/latest/install.html)
## ‼️ Important Links (Docs, Installation guide, Discord Support)
| Installation Guide | Documentation | Discord |
| ------------------ | ------------- | ------- |
| 📖 [Installation Guide](https://docs.ciphey.online/en/latest/install.html) | 📚 [Documentation](https://docs.ciphey.online) | 🦜 [Discord](https://discord.ciphey.online)
## Running Ciphey
There are 3 ways to run Ciphey.
1. File Input `ciphey - encrypted.txt`
2. Unqualified input `ciphey -- "Encrypted input`
@ -63,26 +128,35 @@ There are 3 ways to run Ciphey.
![Gif showing 3 ways to run Ciphey](Pictures_for_README/3ways.gif)
To get rid of the progress bars, probability table, and all the noise use the grep mode.
To get rid of the progress bars, probability table, and all the noise use the quiet mode.
```ciphey -t "encrypted text here" -g```
```ciphey -t "encrypted text here" -q```
For a full list of arguments, run `ciphey --help`.
### Importing Ciphey
You can import Ciphey\'s __main__ and use it in your own programs and code.
This is feature is expected to expand in the next major version.
# Docs
The docs are located at [https://ciphey.readthedocs.io/en/latest/](https://ciphey.readthedocs.io/en/latest/)
### ⚗️ Importing Ciphey
You can import Ciphey\'s main and use it in your own programs and code. `from Ciphey.__main__ import main`
# Contributors
Ciphey was invented by [Brandon Skerritt](https://github.com/brandonskerritt) way back in 2008 (don't worry, the code has upgraded since then 😜) but it wouldn't be where it is today without [Cyclic3](https://github.com/Cyclic3).
## Contributing
Please read the [contributing file](https://github.com/Ciphey/Ciphey/blob/master/CONTRIBUTING.md) or submit an issue and we can help you.
## Financial Contributors
Please donate to us, we're students and we want Huel.
# 🎪 Contributors
Ciphey was invented by [Brandon Skerritt](https://github.com/brandonskerritt) in 2008, and revived in 2019. Ciphey wouldn't be where it was today without [Cyclic3](https://github.com/Cyclic3) - president of UoL's Cyber Security Society.
## Contributors ✨
Ciphey was revived & recreated by the [Cyber Security Society](https://www.cybersoc.cf/) for use in CTFs. If you're ever in Liverpool, consider giving a talk or sponsoring our events. Email us at `cybersecurity@society.liverpoolguild.org` to find out more 🤠
**Major Credit** to George H for designing the searching algorithm among other things.
**Special thanks** to [varghalladesign](https://www.facebook.com/varghalladesign) for designing the logo. Check out their other design work!
## 🐕‍🦺 [Contributing](CONTRIBUTING.md)
Don't be afraid to contribute! We have many, many things you can do to help out. Each of them labelled and easily explained with examples. If you're trying to contribute but stuck, tag @brandonskerritt in the GitHub issue ✨
Alternatively, join the Discord group and send a message there (link in [contrib file](CONTRIBUTING.md)) or at the top of this README as a badge.
Please read the [contributing file](CONTRIBUTING.md) for exact details on how to contribute ✨
## 💰 Financial Contributors
The contributions will be used to fund not only the future of Ciphey and its authors, but also Cyber Security Society at the University of Liverpool.
GitHub doesn't support "sponsor this project and we'll evenly distribute the money", so pick a link and we'll sort it out on our end 🥰
## ✨ Contributors
Thanks goes to these wonderful people ([emoji key](https://allcontributors.org/docs/en/emoji-key)):

12
ciphey.rb Normal file
View File

@ -0,0 +1,12 @@
class Ciphey < Formula
desc "Automated decryption tool"
homepage ""
url "https://github.com/Ciphey/Ciphey/archive/4.2.0.tar.gz"
sha256 "013c438cc1f1c34c314bb202209acb36d9da142d4febeb21e5d4a06fa7b4dd7c"
def install
bin.install "ciphey"
end
end

View File

@ -30,7 +30,7 @@ class Ascii:
"Extra Information": None,
}
if self.lc.checkLanguage(result):
if self.lc.check(result):
logger.debug(f"English found in ASCII, returning {result}")
return {
"lc": self.lc,

View File

@ -1,156 +0,0 @@
import base64
import binascii
from typing import Callable
from loguru import logger
import base58
import base62
class Bases:
"""
turns base64 strings into normal strings
"""
def __init__(self, lc):
self.lc = lc
def decrypt(self, text: str):
logger.debug("Attempting base decoding")
bases = [
self.base32(text),
self.base16(text),
self.base64(text),
self.base85(text),
self.ascii85(text),
self.base58_bitcoin(text),
self.base58_ripple(text, alphabet=base58.RIPPLE_ALPHABET),
self.b62(text),
]
for answer in bases:
try:
if answer["IsPlaintext?"]:
# good answer
logger.debug(f"Returning true for {answer}")
return answer
except TypeError:
continue
# Base85
# if nothing works, it has failed.
return self.badRet()
def _dispatch(
self, decoder: Callable[[str], bytes], text: str, cipher: str, alphabet=None
):
logger.trace("Attempting base64")
result = None
try:
result = decoder(text) if not alphabet else decoder(text, alphabet)
# yeet turning b strings into normal stringy bois
result = result.decode("utf-8")
except UnicodeDecodeError as e:
logger.trace("Bad unicode")
result = None
except binascii.Error as e:
logger.trace("binascii error")
result = None
except ValueError:
logger.trace("Failed to decode base")
result = None
except:
logger.trace("Failed to decode base")
result = None
if result is not None and self.lc.checkLanguage(result):
logger.debug(f"Bases successful, returning {result}")
return self.goodRet(result, cipher=cipher)
else:
return self.badRet()
def base64(self, text: str):
"""Base64 decode
args:
text -> text to decode
returns:
the text decoded as base64
"""
logger.trace("Attempting base64")
return self._dispatch(base64.b64decode, text, "base64")
def base32(self, text: str):
"""Base32 decode
args:
text -> text to decode
returns:
the text decoded as base32
"""
logger.trace("Attempting base32")
return self._dispatch(base64.b32decode, text, "base32")
def base16(self, text: str):
"""Base16 decode
args:
text -> text to decode
returns:
the text decoded as base16
"""
logger.trace("Attempting base16")
return self._dispatch(base64.b16decode, text, "base16")
def base85(self, text: str):
"""Base85 decode
args:
text -> text to decode
returns:
the text decoded as base85
"""
logger.trace("Attempting base85")
return self._dispatch(base64.b85decode, text, "base85")
def ascii85(self, text: str):
"""Base85 decode
args:
text -> text to decode
returns:
the text decoded as base85
"""
logger.trace("Attempting ascii85")
return self._dispatch(base64.a85decode, text, "base85")
def base58_bitcoin(self, text: str):
logger.trace("Attempting Base58 Bitcoin")
return self._dispatch(base58.b58decode, text, "base58_bitcoin")
def base58_ripple(self, text: str, alphabet: str):
logger.trace("Attempting Base58 ripple alphabet")
return self._dispatch(base58.b58decode, text, "base58_ripple", alphabet=alphabet)
def b62(self, text: str):
logger.trace("Attempting base62")
return self._dispatch(base62.decode, text, "base62")
def goodRet(self, result, cipher):
logger.debug(f"Result for base is true, where result is {result}")
return {
"lc": self.lc,
"IsPlaintext?": True,
"Plaintext": result,
"Cipher": cipher,
"Extra Information": None,
}
def badRet(self):
return {
"lc": self.lc,
"IsPlaintext?": False,
"Plaintext": None,
"Cipher": None,
"Extra Information": None,
}

View File

@ -1,62 +0,0 @@
import binascii
from loguru import logger
class Binary:
def __init__(self, lc):
self.lc = lc
def decrypt(self, text):
logger.debug("Attempting to decrypt binary")
try:
result = self.decode(text)
except ValueError as e:
return {
"lc": self.lc,
"IsPlaintext?": False,
"Plaintext": None,
"Cipher": None,
"Extra Information": None,
}
except TypeError as e:
return {
"lc": self.lc,
"IsPlaintext?": False,
"Plaintext": None,
"Cipher": None,
"Extra Information": None,
}
if self.lc.checkLanguage(result):
logger.debug(f"Answer found for binary")
return {
"lc": self.lc,
"IsPlaintext?": True,
"Plaintext": result,
"Cipher": "Ascii to Binary encoded",
"Extra Information": None,
}
return {
"lc": self.lc,
"IsPlaintext?": False,
"Plaintext": None,
"Cipher": None,
"Extra Information": None,
}
def decode(self, text):
"""
Decodes into binary using .encode()
"""
text = text.replace(" ", "")
# to a bytes string
text = text.encode("utf-8")
# into base 2
n = int(text, 2)
# into ascii
text = n.to_bytes((n.bit_length() + 7) // 8, "big").decode()
return text

View File

@ -1,4 +1,3 @@
from .bases import Bases
from .binary import Binary
from .hexadecimal import Hexadecimal
from .ascii import Ascii
@ -11,7 +10,6 @@ from loguru import logger
class EncodingParent:
def __init__(self, lc):
self.lc = lc
self.base64 = Bases(self.lc)
self.binary = Binary(self.lc)
self.hex = Hexadecimal(self.lc)
self.ascii = Ascii(self.lc)
@ -40,7 +38,7 @@ class EncodingParent:
for answer in answers:
logger.debug(f"All answers are {answers}")
# adds the LC objects together
# adds the Checkers objects together
# self.lc = self.lc + answer["lc"]
if answer is not None and answer["IsPlaintext?"]:
logger.debug(f"Plaintext found {answer}")

View File

@ -1,29 +0,0 @@
from loguru import logger
class Hexadecimal:
def __init__(self, lc):
self.lc = lc
def decrypt(self, text):
logger.debug("Attempting hexadecimal decryption")
try:
result = bytearray.fromhex(text).decode()
except ValueError as e:
return {
"lc": self.lc,
"IsPlaintext?": False,
"Plaintext": None,
"Cipher": None,
"Extra Information": None,
}
if self.lc.checkLanguage(result):
logger.debug(f"Hexadecimal successful, returning {result}")
return {
"lc": self.lc,
"IsPlaintext?": True,
"Plaintext": result,
"Cipher": "Ascii to Hexadecimal encoded",
"Extra Information": None,
}

View File

@ -0,0 +1,27 @@
class letters:
"""Deals with Nato Strings / first letter of every word"""
def __init__(self):
None
def __name__(self):
return "Letters"
def decrypt(self, text: str) -> dict:
return text
def first_letter_every_word(self, text):
"""
This should be supplied a string like "hello my name is"
"""
text = text.split(".")
new_text = []
for sentence in text:
for word in sentence.split(" "):
new_text.append(word[0])
# Applies a space after every sentence
# which might be every word
new_text.append(" ")

View File

@ -1,61 +0,0 @@
from loguru import logger
import cipheydists
class MorseCode:
def __init__(self, lc):
self.lc = lc
self.ALLOWED = {".", "-", " ", "/", "\n"}
self.MORSE_CODE_DICT = dict(cipheydists.get_charset("morse"))
self.MORSE_CODE_DICT_INV = {v: k for k, v in self.MORSE_CODE_DICT.items()}
def decrypt(self, text):
logger.debug("Attempting morse code")
if not self.checkIfMorse(text):
return {
"lc": self.lc,
"IsPlaintext?": False,
"Plaintext": None,
"Cipher": "Morse Code",
"Extra Information": None,
}
try:
result = self.unmorse_it(text)
except TypeError as e:
return {
"lc": self.lc,
"IsPlaintext?": False,
"Plaintext": None,
"Cipher": "Morse Code",
"Extra Information": None,
}
logger.debug(f"Morse code successful, returning {result}")
return {
"lc": self.lc,
"IsPlaintext?": True,
"Plaintext": result,
"Cipher": "Morse Code",
"Extra Information": None,
}
def checkIfMorse(self, text):
count = 0
for i in text:
if i in self.ALLOWED:
count += 1
return count / len(text) > 0.625
def unmorse_it(self, text):
returnMsg = ""
for word in text.split("/"):
for char in word.strip().split():
# translates every letter
try:
m = self.MORSE_CODE_DICT_INV[char]
except KeyError:
m = ""
returnMsg = returnMsg + m
# after every word add a space
# after every word add a space
returnMsg = returnMsg + " "
return returnMsg.strip().upper()

View File

@ -46,11 +46,11 @@ class Octal:
}
def decode(self, text):
'''
"""
It takes an octal string and return a string
:octal_str: octal str like "110 145 154"
'''
"""
str_converted = ""
for octal_char in text.split(" "):
str_converted += chr(int(octal_char, 8))

View File

@ -99,7 +99,7 @@ def crack(hashvalue, lc):
if len(hashvalue) == 32:
for api in md5:
r = api(hashvalue, "md5")
result = lc.checkLanguage(r) if r is not None else None
result = lc.check(r) if r is not None else None
if result is not None or r is not None:
logger.debug(f"MD5 returns True {r}")
return {
@ -112,7 +112,7 @@ def crack(hashvalue, lc):
elif len(hashvalue) == 40:
for api in sha1:
r = api(hashvalue, "sha1")
result = lc.checkLanguage(r) if r is not None else None
result = lc.check(r) if r is not None else None
if result is not None and r is not None:
logger.debug(f"sha1 returns true")
return {
@ -125,7 +125,7 @@ def crack(hashvalue, lc):
elif len(hashvalue) == 64:
for api in sha256:
r = api(hashvalue, "sha256")
result = lc.checkLanguage(r) if r is not None else None
result = lc.check(r) if r is not None else None
if result is not None and r is not None:
logger.debug(f"sha256 returns true")
return {
@ -138,7 +138,7 @@ def crack(hashvalue, lc):
elif len(hashvalue) == 96:
for api in sha384:
r = api(hashvalue, "sha384")
result = lc.checkLanguage(r) if r is not None else None
result = lc.check(r) if r is not None else None
if result is not None and r is not None:
logger.debug(f"sha384 returns true")
return {
@ -151,7 +151,7 @@ def crack(hashvalue, lc):
elif len(hashvalue) == 128:
for api in sha512:
r = api(hashvalue, "sha512")
result = lc.checkLanguage(r) if r is not None else None
result = lc.check(r) if r is not None else None
if result is not None and r is not None:
logger.debug(f"sha512 returns true")
return {

View File

@ -1,117 +0,0 @@
"""
© Brandon Skerritt
https://github.com/brandonskerritt/ciphey
"""
try:
import Decryptor.basicEncryption.caesar as ca
import Decryptor.basicEncryption.reverse as re
import Decryptor.basicEncryption.vigenere as vi
import Decryptor.basicEncryption.pigLatin as pi
import Decryptor.basicEncryption.transposition as tr
except ModuleNotFoundError:
import ciphey.Decryptor.basicEncryption.caesar as ca
import ciphey.Decryptor.basicEncryption.reverse as re
import ciphey.Decryptor.basicEncryption.vigenere as vi
import ciphey.Decryptor.basicEncryption.pigLatin as pi
import ciphey.Decryptor.basicEncryption.transposition as tr
"""
So I want to assign the prob distribution to objects
so it makes sense to do this?
list of objects
for each item in the prob distribution
replace that with the appropriate object in the list?
So each object has a getName func that returns the name as a str
new_prob_dict = {}
for key, val in self.prob:
for obj in list:
if obj.getName() == key:
new_prob_dict[obj] = val
But I don't need to do all this, do I?
The dict comes in already sorted.
So why do I need the probability values if it's sorted?
It'd be easier if I make a list in the same order as the dict?
sooo
list_objs = [caeser, etc]
counter = 0
for key, val in self.prob:
for listCounter, item in enumerate(list_objs):
if item.getName() == key:
# moves the item
list_objs.insert(counter, list_objs.pop(listCounter))
counter = counter + 1
Eventually we get a sorted list of obj
"""
class BasicParent:
def __init__(self, lc):
self.lc = lc
self.caesar = ca.Caesar(self.lc)
self.reverse = re.Reverse(self.lc)
self.vigenere = vi.Vigenere(self.lc)
self.pig = pi.PigLatin(self.lc)
self.trans = tr.Transposition(self.lc)
self.list_of_objects = [self.caesar, self.reverse, self.pig, self.trans]
def decrypt(self, text):
self.text = text
from multiprocessing.dummy import Pool as ThreadPool
pool = ThreadPool(16)
answers = pool.map(self.callDecrypt, self.list_of_objects)
"""for item in self.list_of_objects:
result = item.decrypt(text)
answers.append(result)"""
for answer in answers:
# adds the LC objects together
# self.lc = self.lc + answer["lc"]
if answer["IsPlaintext?"]:
return answer
# so vigenere runs ages
# and you cant kill threads in a pool
# so i just run it last lol]
#
# Not anymore! #basedcore
result = self.callDecrypt(self.vigenere)
if result["IsPlaintext?"]:
return result
return {
"lc": self.lc,
"IsPlaintext?": False,
"Plaintext": None,
"Cipher": None,
"Extra Information": None,
}
def callDecrypt(self, obj):
# i only exist to call decrypt
return obj.decrypt(self.text)
def setProbTable(self, prob):
"""I'm still writing this"""
self.probabilityDistribution = prob
# we get a sorted list of objects :)
counter = 0
for key, val in self.probabilityDistribution.items():
for listCounter, item in enumerate(self.list_of_objects):
if item.getName() == key:
# moves the item
list_objs.insert(counter, list_objs.pop(listCounter))
counter = counter + 1
def __name__(self):
return "basicParent"

View File

@ -36,7 +36,7 @@ class Caesar:
for candidate in possible_keys:
translated = cipheycore.caesar_decrypt(message, candidate.key, group)
result = self.lc.checkLanguage(translated)
result = self.lc.check(translated)
if result:
logger.debug(f"Caesar cipher returns true {result}")
return {

View File

@ -49,7 +49,7 @@ class PigLatin:
# TODO find a way to return 2 variables
# this returns 2 variables in a tuple
if self.lc.checkLanguage(message3AY):
if self.lc.check(message3AY):
logger.debug("Pig latin 3AY returns True")
return {
"lc": self.lc,
@ -58,7 +58,7 @@ class PigLatin:
"Cipher": "Pig Latin",
"Extra Information": None,
}
elif self.lc.checkLanguage(messagepigWAY):
elif self.lc.check(messagepigWAY):
logger.debug("Pig latin WAY returns True")
return {
"lc": self.lc,

View File

@ -1,42 +0,0 @@
import sys
sys.path.append("..")
try:
import mathsHelper as mh
except ModuleNotFoundError:
import ciphey.mathsHelper as mh
from loguru import logger
class Reverse:
def __init__(self, lc):
self.lc = lc
self.mh = mh.mathsHelper()
def decrypt(self, message):
logger.debug("In reverse")
message = self.mh.strip_puncuation(message)
message = message[::-1]
result = self.lc.checkLanguage(message)
if result:
logger.debug("Reverse returns True")
return {
"lc": self.lc,
"IsPlaintext?": True,
"Plaintext": message,
"Cipher": "Reverse",
"Extra Information": None,
}
else:
logger.debug(f"Reverse returns False")
return {
"lc": self.lc,
"IsPlaintext?": False,
"Plaintext": None,
"Cipher": "Reverse",
"Extra Information": None,
}
def getName(self):
return "Reverse"

View File

@ -34,7 +34,7 @@ class Transposition:
logger.debug(f"Transposition trying key {key}")
decryptedText = self.decryptMessage(key, message)
# if decrypted english is found, return them
result = self.lc.checkLanguage(decryptedText)
result = self.lc.check(decryptedText)
if result:
logger.debug("transposition returns true")
return {

View File

@ -1,224 +0,0 @@
import itertools, re
import cipheycore
import cipheydists
class Vigenere:
def __init__(self, lc):
self.LETTERS = "abcdefghijklmnopqrstuvwxyz"
self.SILENT_MODE = True # If set to True, program doesn't print anything.
self.NUM_MOST_FREQ_LETTERS = 4 # Attempt this many letters per subkey.
self.MAX_KEY_LENGTH = 16 # Will not attempt keys longer than this.
self.NONLETTERS_PATTERN = re.compile("[^A-Z]")
self.lc = lc
def decrypt(self, text):
result = self.hackVigenere(text)
if result is None:
return {
"lc": self.lc,
"IsPlaintext?": False,
"Plaintext": None,
"Cipher": "Viginere",
"Extra Information": None,
}
return result
def findRepeatSequencesSpacings(self, message):
# Goes through the message and finds any 3 to 5 letter sequences
# that are repeated. Returns a dict with the keys of the sequence and
# values of a list of spacings (num of letters between the repeats).
# Use a regular expression to remove non-letters from the message:
message = self.NONLETTERS_PATTERN.sub("", message.upper())
# Compile a list of seqLen-letter sequences found in the message:
seqSpacings = {} # Keys are sequences, values are lists of int spacings.
for seqLen in range(3, 6):
for seqStart in range(len(message) - seqLen):
# Determine what the sequence is, and store it in seq:
seq = message[seqStart : seqStart + seqLen]
# Look for this sequence in the rest of the message:
for i in range(seqStart + seqLen, len(message) - seqLen):
if message[i : i + seqLen] == seq:
# Found a repeated sequence.
if seq not in seqSpacings:
seqSpacings[seq] = [] # Initialize a blank list.
# Append the spacing distance between the repeated
# sequence and the original sequence:
seqSpacings[seq].append(i - seqStart)
return seqSpacings
def getUsefulFactors(self, num):
# Returns a list of useful factors of num. By "useful" we mean factors
# less than MAX_KEY_LENGTH + 1 and not 1. For example,
# getUsefulFactors(144) returns [2, 3, 4, 6, 8, 9, 12, 16]
if num < 2:
return [] # Numbers less than 2 have no useful factors.
factors = [] # The list of factors found.
# When finding factors, you only need to check the integers up to
# MAX_KEY_LENGTH.
for i in range(2, self.MAX_KEY_LENGTH + 1): # Don't test 1: it's not useful.
if num % i == 0:
factors.append(i)
otherFactor = int(num / i)
if otherFactor < self.MAX_KEY_LENGTH + 1 and otherFactor != 1:
factors.append(otherFactor)
return list(set(factors)) # Remove duplicate factors.
def getItemAtIndexOne(self, items):
return items[1]
def getMostCommonFactors(self, seqFactors):
# First, get a count of how many times a factor occurs in seqFactors:
factorCounts = {} # Key is a factor, value is how often it occurs.
# seqFactors keys are sequences, values are lists of factors of the
# spacings. seqFactors has a value like: {'GFD': [2, 3, 4, 6, 9, 12,
# 18, 23, 36, 46, 69, 92, 138, 207], 'ALW': [2, 3, 4, 6, ...], ...}
for seq in seqFactors:
factorList = seqFactors[seq]
for factor in factorList:
if factor not in factorCounts:
factorCounts[factor] = 0
factorCounts[factor] += 1
# Second, put the factor and its count into a tuple, and make a list
# of these tuples so we can sort them:
factorsByCount = []
for factor in factorCounts:
# Exclude factors larger than MAX_KEY_LENGTH:
if factor <= self.MAX_KEY_LENGTH:
# factorsByCount is a list of tuples: (factor, factorCount)
# factorsByCount has a value like: [(3, 497), (2, 487), ...]
factorsByCount.append((factor, factorCounts[factor]))
# Sort the list by the factor count:
factorsByCount.sort(key=self.getItemAtIndexOne, reverse=True)
return factorsByCount
def kasiskiExamination(self, ciphertext):
# Find out the sequences of 3 to 5 letters that occur multiple times
# in the ciphertext. repeatedSeqSpacings has a value like:
# {'EXG': [192], 'NAF': [339, 972, 633], ... }
repeatedSeqSpacings = self.findRepeatSequencesSpacings(ciphertext)
# (See getMostCommonFactors() for a description of seqFactors.)
seqFactors = {}
for seq in repeatedSeqSpacings:
seqFactors[seq] = []
for spacing in repeatedSeqSpacings[seq]:
seqFactors[seq].extend(self.getUsefulFactors(spacing))
# (See getMostCommonFactors() for a description of factorsByCount.)
factorsByCount = self.getMostCommonFactors(seqFactors)
# Now we extract the factor counts from factorsByCount and
# put them in allLikelyKeyLengths so that they are easier to
# use later:
allLikelyKeyLengths = []
for twoIntTuple in factorsByCount:
allLikelyKeyLengths.append(twoIntTuple[0])
return allLikelyKeyLengths
def attemptHackWithKeyLength(self, ciphertext, mostLikelyKeyLength):
# Determine the most likely letters for each letter in the key:
ciphertext = ciphertext.lower()
# Do core work
group = cipheydists.get_charset("english")["lcase"]
expected = cipheydists.get_dist("lcase")
possible_keys = cipheycore.vigenere_crack(
ciphertext, expected, group, mostLikelyKeyLength
)
n_keys = len(possible_keys)
# Try all the feasible keys
for candidate in possible_keys:
nice_key = list(candidate.key)
# Create a possible key from the letters in allFreqScores:
if not self.SILENT_MODE:
print("Attempting with key: %s" % nice_key)
decryptedText = cipheycore.vigenere_decrypt(
ciphertext, candidate.key, group
)
if self.lc.checkLanguage(decryptedText):
# Set the hacked ciphertext to the original casing:
origCase = []
for i in range(len(ciphertext)):
if ciphertext[i].isupper():
origCase.append(decryptedText[i].upper())
else:
origCase.append(decryptedText[i].lower())
decryptedText = "".join(origCase)
# Check with user to see if the key has been found:
return {
"lc": self.lc,
"IsPlaintext?": True,
"Plaintext": decryptedText,
"Cipher": "Viginere",
"Extra Information": f"The key used is {nice_key}",
}
# No English-looking decryption found, so return None:
return None
def hackVigenere(self, ciphertext):
# First, we need to do Kasiski Examination to figure out what the
# length of the ciphertext's encryption key is:
allLikelyKeyLengths = self.kasiskiExamination(ciphertext)
if not self.SILENT_MODE:
keyLengthStr = ""
for keyLength in allLikelyKeyLengths:
keyLengthStr += "%s " % (keyLength)
print(
"Kasiski Examination results say the most likely key lengths are: "
+ keyLengthStr
+ "\n"
)
hackedMessage = None
for keyLength in allLikelyKeyLengths:
if not self.SILENT_MODE:
print(
"Attempting hack with key length %s (%s possible keys)..."
% (keyLength, self.NUM_MOST_FREQ_LETTERS ** keyLength)
)
hackedMessage = self.attemptHackWithKeyLength(ciphertext, keyLength)
if hackedMessage != None:
break
# If none of the key lengths we found using Kasiski Examination
# worked, start brute-forcing through key lengths:
if hackedMessage == None:
if not self.SILENT_MODE:
print(
"Unable to hack message with likely key length(s). Brute forcing key length..."
)
for keyLength in range(1, self.MAX_KEY_LENGTH + 1):
# Don't re-check key lengths already tried from Kasiski:
if keyLength not in allLikelyKeyLengths:
if not self.SILENT_MODE:
print(
"Attempting hack with key length %s (%s possible keys)..."
% (keyLength, self.NUM_MOST_FREQ_LETTERS ** keyLength)
)
hackedMessage = self.attemptHackWithKeyLength(ciphertext, keyLength)
if hackedMessage != None:
break
return hackedMessage
def getName(self):
return "Viginere"

View File

@ -1,4 +0,0 @@
# Re-expose interface for lazy people
from .iface import LanguageChecker

View File

@ -1,221 +0,0 @@
"""
© Brandon Skerritt
Github: brandonskerritt
Class to determine whether somethine is English or not.
1. Calculate the Chi Squared score of a sentence
2. If the score is significantly lower than the average score, it _might_ be English
2.1. If the score _might_ be English, then take the text and compare it to the sorted dictionary
in O(n log n) time.
It creates a percentage of "How much of this text is in the dictionary?"
The dictionary contains:
* 20,000 most common US words
* 10,000 most common UK words (there's no repition between the two)
* The top 10,000 passwords
If the word "Looks like" English (chi-squared) and if it contains English words, we can conclude it is
very likely English. The alternative is doing the dictionary thing but with an entire 479k word dictionary (slower)
2.2. If the score is not English, but we haven't tested enough to create an average, then test it against
the dictionary
Things to optimise:
* We only run the dictionary if it's 20% smaller than the average for chi squared
* We consider it "English" if 45% of the text matches the dictionary
* We run the dictionary if there is less than 10 total chisquared test
How to add a language:
* Download your desired dictionary. Try to make it the most popular words, for example. Place this file into this
folder with languagename.txt
As an example, this comes built in with english.txt
Find the statistical frequency of each letter in that language.
For English, we have:
self.languages = {
"English":
[0.0855, 0.0160, 0.0316, 0.0387, 0.1210,0.0218, 0.0209, 0.0496, 0.0733, 0.0022,0.0081, 0.0421, 0.0253, 0.0717,
0.0747,0.0207, 0.0010, 0.0633, 0.0673, 0.0894,0.0268, 0.0106, 0.0183, 0.0019, 0.0172,0.0011]
}
In chisquared.py
To add your language, do:
self.languages = {
"English":
[0.0855, 0.0160, 0.0316, 0.0387, 0.1210,0.0218, 0.0209, 0.0496, 0.0733, 0.0022,0.0081, 0.0421, 0.0253, 0.0717,
0.0747,0.0207, 0.0010, 0.0633, 0.0673, 0.0894,0.0268, 0.0106, 0.0183, 0.0019, 0.0172,0.0011]
"German": [0.0973]
}
In alphabetical order
And you're.... Done! Make sure the name of the two match up
"""
from typing import Dict, Set
from .iface import LanguageChecker
from string import punctuation
from loguru import logger
import string
import os
import sys
from loguru import logger
from .chisquared import chiSquared
import cipheydists
sys.path.append("..")
try:
import mathsHelper as mh
except ModuleNotFoundError:
import ciphey.mathsHelper as mh
class Brandon(LanguageChecker):
"""
Class designed to confirm whether something is **language** based on how many words of **language** appears
Call confirmLanguage(text, language)
* text: the text you want to confirm
* language: the language you want to confirm
Find out what language it is by using chisquared.py, the highest chisquared score is the language
languageThreshold = 45
if a string is 45% **language** words, then it's confirmed to be english
"""
wordlist: set
def cleanText(self, text: str) -> set:
"""Cleans the text ready to be checked
Strips punctuation, makes it lower case, turns it into a set separated by spaces, removes duplicate words
Args:
text -> The text we use to perform analysis on
Returns:
text -> the text as a list, now cleaned
"""
# makes the text unique words and readable
text = text.lower()
text = self.mh.strip_puncuation(text)
text = text.split(" ")
text = set(text)
return text
def checkWordlist(self, text: Set[str]) -> float:
"""Sorts & searches the dict, and returns the proportion of the words that are in the dictionary
Args:
text -> The text we use to perform analysis on
language -> the language we want to check
Returns:
counter -> how many words in text, are in the dict of language
"""
# reads through most common words / passwords
# and calculates how much of that is in language
return len(text.intersection(self.wordlist)) / len(text)
def check1000Words(self, text: Set[str]) -> bool:
"""Checks to see if word is in the list of 1000 words
the 1000words is a dict, so lookup is O(1)
Args:
text -> The text we use to text (a word)
Returns:
bool -> whether it's in the dict or not.
"""
# If we have no wordlist, then we can't reject the candidate on this basis
if self.top1000Words is None:
return True
if text is None:
return False
# If any of the top 1000 words in the text appear
# return true
for word in text:
# I was debating using any() here, but I think they're the
# same speed so it doesn't really matter too much
if word in self.top1000Words:
return True
return False
def confirmLanguage(self, text: set) -> True:
"""Confirms whether given text is language
If the proportion (taken from checkDictionary) is higher than the language threshold, return True
Args:
text -> The text we use to text (a word)
language -> the language we use to check
Returns:
bool -> whether it's written in Language or not
"""
proportion = self.checkWordlist(text)
if self.checkWordlist(text) >= self.languageThreshold:
logger.trace(f"The language proportion {proportion} is over the threshold {self.languageThreshold}")
return True
else:
logger.trace(f"The language proportion {proportion} is under the threshold {self.languageThreshold}")
return False
def __init__(self, config: dict):
# Suppresses warning
super().__init__(config)
self.mh = mh.mathsHelper()
self.languageThreshold = config["params"].get("threshold", 0.55)
self.top1000Words = config["params"].get("top1000")
self.wordlist = config["wordlist"]
def checkLanguage(self, text: str) -> bool:
"""Checks to see if the text is in English
Performs a decryption, but mainly parses the internal data packet and prints useful information.
Args:
text -> The text we use to perform analysis on
Returns:
bool -> True if the text is English, False otherwise.
"""
logger.trace(f"In Language Checker with \"{text}\"")
text = self.cleanText(text)
logger.trace(f"Text split to \"{text}\"")
if text == "":
return False
if not self.check1000Words(text):
logger.debug(
f"1000 words failed. This is not plaintext"
)
return False
logger.trace(
f"1000words check passed"
)
if not self.confirmLanguage(text):
logger.debug(f"Dictionary check failed. This is not plaintext")
return False
logger.trace(f"Dictionary check passed. This is plaintext")
return True
@staticmethod
def getArgs() -> Dict[str, object]:
return {
"top1000": {"desc": "A json dictionary of the top 1000 words", "req": False},
"threshold": {"desc": "The minimum proportion (between 0 and 1) that must be in the dictionary", "req": False}
}
# Define alias
ciphey_language_checker = Brandon

View File

@ -1,81 +0,0 @@
"""
© Brandon Skerritt
Github: brandonskerritt
Class calculates the Chi squared score
"""
from string import punctuation
from numpy import std
import sys
sys.path.append("..")
try:
import mathsHelper as mh
except ModuleNotFoundError:
import ciphey.mathsHelper as mh
from loguru import logger
import cipheycore
import cipheydists
# I had a bug where empty string was being added to letter freq dictionary
# this solves it :)
punctuation += " "
NUMBERS = "1234567890"
class chiSquared:
"""Class that calculates the Chi squared score and tries to work out what language it might be
to add a new language, go into this class (/app/languageChecker/chisquared.py)
Find "self.languages" and add it to the dictionary like "German":[0.789, 0.651...]
The list is the letter frequency ordered in alphabetical order """
def __init__(self):
self.language = cipheydists.get_dist("twist")
self.average = 0.0
self.totalDone = 0.0
self.oldAverage = 0.0
self.mh = mh.mathsHelper()
self.highestLanguage = ""
self.totalChi = 0.0
self.totalEqual = False
self.chisAsaList = []
# these are settings that may impact how the program works overall
self.chiSquaredSignificanceThreshold = 0.001 # The p value that we reject below
def checkChi(self, text):
if text is None:
return False
if type(text) is bytes:
try:
text = text.decode()
except:
return None
"""Checks to see if the Chi score is good
if it is, it returns True
Call this when you want to determine whether something is likely to be Chi or not
Arguments:
* text - the text you want to run a Chi Squared score on
Outputs:
* True - if it has a significantly lower chi squared score
* False - if it doesn't have a significantly lower chi squared score
"""
# runs after every chi squared to see if it's 1 significantly lower than averae
# the or statement is bc if the program has just started I don't want it to ignore the
# ones at the start
analysis = cipheycore.analyse_string(text)
chisq = cipheycore.chisq_test(analysis, self.language)
logger.debug(f"Chi-squared p-value is {chisq}")
return chisq > self.chiSquaredSignificanceThreshold
def getMostLikelyLanguage(self):
"""Returns what the most likely language is
Only used when the threshold of checkChi is reached"""
return self.highestLanguage

View File

@ -1,18 +0,0 @@
from abc import ABC, abstractmethod
from typing import Dict
class LanguageChecker(ABC):
@staticmethod
@abstractmethod
def getArgs(**kwargs) -> Dict[str, object]:
"""The returned dictionary must be of the format:
{<name:string>: {"req": <required:bool>, "desc": <description:string>}, ...}
"""
pass
@abstractmethod
def checkLanguage(self, text: str) -> bool: pass
@abstractmethod
def __init__(self, config: Dict[str, object]): pass

View File

@ -1 +1,7 @@
import __main__
from . import common
from . import iface
from . import basemods
from . import __main__

View File

@ -4,313 +4,147 @@
© Brandon Skerritt
https://github.com/brandonskerritt/ciphey
https://github.com/ciphey
https://docs.ciphey.online
The cycle goes:
main -> argparsing (if needed) -> call_encryption -> new Ciphey object -> decrypt() -> produceProbTable ->
one_level_of_decryption -> decrypt_normal
Ciphey can be called 3 ways:
echo 'text' | ciphey
ciphey 'text'
ciphey -t 'text'
main captures the first 2
argparsing captures the last one (-t)
it sends this to call_encryption, which can handle all 3 arguments using dict unpacking
decrypt() creates the prob table and prints it.
one_level_of_decryption() allows us to repeatedly call one_level_of_decryption on the inputs
so if something is doubly encrypted, we can use this to find it.
Decrypt_normal is one round of decryption. We need one_level_of_decryption to call it, as
one_level_of_decryption handles progress bars and stuff.
"""
import os
import warnings
import argparse
import sys
from typing import Optional, Tuple, Dict
from typing import Optional, Dict, Any, List
import bisect
from ciphey.iface import SearchLevel
from . import iface
from rich.console import Console
from rich.table import Column, Table
from rich.table import Table
from loguru import logger
import click
import click_spinner
warnings.filterwarnings("ignore")
# Depending on whether Ciphey is called, or Ciphey/__main__
# we need different imports to deal with both cases
try:
from ciphey.LanguageChecker import LanguageChecker as lc
from ciphey.neuralNetworkMod.nn import NeuralNetwork
from ciphey.Decryptor.basicEncryption.basic_parent import BasicParent
from ciphey.Decryptor.Hash.hashParent import HashParent
from ciphey.Decryptor.Encoding.encodingParent import EncodingParent
import ciphey.mathsHelper as mh
except ModuleNotFoundError:
from LanguageChecker import LanguageChecker as lc
from neuralNetworkMod.nn import NeuralNetwork
from Decryptor.basicEncryption.basic_parent import BasicParent
from Decryptor.Hash.hashParent import HashParent
from Decryptor.Encoding.encodingParent import EncodingParent
import mathsHelper as mh
def decrypt(config: iface.Config, ctext: Any) -> List[SearchLevel]:
"""A simple alias for searching a ctext and makes the answer pretty"""
res: iface.SearchResult = config.objs["searcher"].search(ctext)
if config.verbosity < 0:
return res.path[-1].result.value
else:
return iface.pretty_search_results(res)
def make_default_config(ctext: str, trace: bool = False) -> Dict[str, object]:
from ciphey.LanguageChecker.brandon import ciphey_language_checker as brandon
import cipheydists
return {
"ctext": ctext,
"grep": False,
"info": False,
"debug": "TRACE" if trace else "WARNING",
"checker": brandon,
"wordlist": set(cipheydists.get_list("english")),
"params": {},
}
class Ciphey:
config = dict()
params = dict()
def __init__(self, config):
logger.remove()
logger.configure()
logger.add(sink=sys.stderr, level=config["debug"], colorize=sys.stderr.isatty())
logger.opt(colors=True)
logger.debug(f"""Debug level set to {config["debug"]}""")
# general purpose modules
self.ai = NeuralNetwork()
self.lc = config["checker"](config)
self.mh = mh.mathsHelper()
# the one bit of text given to us to decrypt
self.text: str = config["ctext"]
self.basic = BasicParent(self.lc)
self.hash = HashParent(self.lc)
self.encoding = EncodingParent(self.lc)
self.level: int = 1
self.config = config
self.console = Console()
self.probability_distribution: dict = {}
self.what_to_choose: dict = {}
def decrypt(self) -> Optional[Dict]:
"""Performs the decryption of text
Creates the probability table, calls one_level_of_decryption
Args:
None, it uses class variables.
Returns:
None
"""
# Read the documentation for more on this function.
# checks to see if inputted text is plaintext
result = self.lc.checkLanguage(self.text)
if result:
print("You inputted plain text!")
return {
"lc": self.lc,
"IsPlaintext?": True,
"Plaintext": self.text,
"Cipher": None,
"Extra Information": None,
}
self.probability_distribution: dict = self.ai.predictnn(self.text)[0]
self.what_to_choose: dict = {
self.hash: {
"sha1": self.probability_distribution[0],
"md5": self.probability_distribution[1],
"sha256": self.probability_distribution[2],
"sha512": self.probability_distribution[3],
},
self.basic: {"caesar": self.probability_distribution[4]},
"plaintext": {"plaintext": self.probability_distribution[5]},
self.encoding: {
"reverse": self.probability_distribution[6],
"base64": self.probability_distribution[7],
"binary": self.probability_distribution[8],
"hexadecimal": self.probability_distribution[9],
"ascii": self.probability_distribution[10],
"morse": self.probability_distribution[11],
},
}
logger.trace(
f"The probability table before 0.1 in __main__ is {self.what_to_choose}"
)
# sorts each individual sub-dictionary
for key, value in self.what_to_choose.items():
for k, v in value.items():
# Sets all 0 probabilities to 0.01, we want Ciphey to try all decryptions.
if v < 0.01:
# this should turn off hashing functions if offline mode is turned on
self.what_to_choose[key][k] = 0.01
logger.trace(
f"The probability table after 0.1 in __main__ is {self.what_to_choose}"
)
self.what_to_choose: dict = self.mh.sort_prob_table(self.what_to_choose)
# Creates and prints the probability table
if not self.config["grep"]:
self.produceprobtable(self.what_to_choose)
logger.debug(
f"The new probability table after sorting in __main__ is {self.what_to_choose}"
)
"""
#for each dictionary in the dictionary
# sort that dictionary
#sort the overall dictionary by the first value of the new dictionary
"""
output = None
if self.level <= 1:
output = self.one_level_of_decryption()
else:
# TODO: make tmpfile
f = open("decryptionContents.txt", "w")
output = self.one_level_of_decryption(file=f)
for i in range(0, self.level):
# open file and go through each text item
pass
logger.debug(f"decrypt is outputting {output}")
return output
def produceprobtable(self, prob_table) -> None:
"""Produces the probability table using Rich's API
Uses Rich's API to print the probability table.
Args:
prob_table -> the probability table generated by the neural network
Returns:
None, but prints the probability table.
"""
logger.debug(f"Producing log table")
table = Table(show_header=True, header_style="bold magenta")
table.add_column("Name of Cipher")
table.add_column("Probability", justify="right")
# for every key, value in dict add a row
# I think key is self.caesarcipher and not "caesar cipher"
# i must callName() somewhere else in this code
sorted_dic: dict = {}
for k, v in prob_table.items():
for key, value in v.items():
# Prevents the table from showing pointless 0.01 probs as they're faked
if value <= 0.01:
continue
# gets the string ready to print
logger.debug(f"Key is {str(key)} and value is {str(value)}")
val: int = round(self.mh.percentage(value, 1), 2)
key_str: str = str(key).capitalize()
# converts "Bases" to "Base"
if "Base" in key_str:
key_str = key_str[0:-2]
sorted_dic[key_str] = val
logger.debug(f"The value as percentage is {val} and key is {key_str}")
sorted_dic: dict = {
k: v
for k, v in sorted(
sorted_dic.items(), key=lambda item: item[1], reverse=True
)
}
for k, v in sorted_dic.items():
table.add_row(k, str(v) + "%")
self.console.print(table)
return None
def one_level_of_decryption(self) -> Optional[dict]:
"""Performs one level of encryption.
Either uses alive_bar or not depending on if self.greppable is set.
Returns:
None.
"""
# Calls one level of decryption
# mainly used to control the progress bar
output = None
if self.config["grep"]:
logger.debug("__main__ is running as greppable")
output = self.decrypt_normal()
else:
logger.debug("__main__ is running with progress bar")
output = self.decrypt_normal()
return output
def decrypt_normal(self, bar=None) -> Optional[dict]:
"""Called by one_level_of_decryption
Performs a decryption, but mainly parses the internal data packet and prints useful information.
Args:
bar -> whether or not to use alive_Bar
Returns:
str if found, or None if not
"""
# This is redundant
# result = self.lc.checkLanguage(self.text)
# if result:
# print("You inputted plain text!")
# print(f"Returning {self.text}")
# return self.text
logger.debug(f"In decrypt_normal")
for key, val in self.what_to_choose.items():
# https://stackoverflow.com/questions/4843173/how-to-check-if-type-of-a-variable-is-string
if not isinstance(key, str):
key.setProbTable(val)
ret: dict = key.decrypt(self.text)
logger.debug(f"Decrypt normal in __main__ ret is {ret}")
logger.debug(
f"The plaintext is {ret['Plaintext']} and the extra information is {ret['Cipher']} and {ret['Extra Information']}"
)
if ret["IsPlaintext?"]:
logger.debug(f"Ret is plaintext")
print(ret["Plaintext"])
if self.config["info"]:
logger.trace("Self.cipher_info runs")
if ret["Extra Information"] is not None:
print(
"The cipher used is",
ret["Cipher"] + ".",
ret["Extra Information"] + ".",
)
else:
print("The cipher used is " + ret["Cipher"] + ".")
return ret
logger.debug("No encryption found")
print(
"""No encryption found. Here are some tips to help crack the cipher:
* Use the probability table to work out what it could be. Base = base16, base32, base64 etc.
* If the probability table says 'Caesar Cipher' then it is a normal encryption that \
Ciphey cannot decrypt yet.
* If Ciphey think's it's a hash, try using hash-identifier to find out what hash it is, \
and then HashCat to crack the hash.
* The encryption may not contain normal English plaintext. It could be coordinates or \
another object no found in the dictionary. Use 'ciphey -d true > log.txt' to generate a log \
file of all attempted decryptions and manually search it."""
)
return None
# def arg_parsing(config: iface.Config) -> Optional[Dict[str, Any]]:
# """This function parses arguments.
#
# Args:
# config: The configuration object
# Returns:
# The config to be passed around for the rest of time
# """
#
# # parser.add_argument(
# # "--default-wordlist",
# # help="Sets the default wordlist",
# # action="store",
# # default=None
# # )
#
# args = config
#
# # First, we should work out how verbose we should be
#
# # Now we have set the log level, we can start debugging
# logger.trace(f"Got arguments {args}")
#
# # the below text does:
# # * if -t is supplied, use that
# # * if ciphey is called like:
# # * REMOVED: ciphey 'encrypted text' use that
# # else if data is piped like:
# # echo 'hello' | ciphey use that
# # if no data is supplied, no arguments supplied.
# text = None
# if args["text"] is not None:
# text = args["text"]
# else:
# print("No input given.")
# exit(1)
#
# if len(sys.argv) == 1:
# print("No arguments were supplied. Look at the help menu with -h or --help")
# return None
#
# args["text"] = text
# if len(args["text"]) < 3:
# print("A string of less than 3 chars cannot be interpreted by Ciphey.")
# return None
#
# # Now we can walk through the arguments, expanding them into the config struct
# config["checker"] = args.get("checker")
# config["info"] = args.get("info")
# config["in"] = args.get("bytes_input")
# config["out"] = args.get("bytes_output")
# config["default_dist"] = args.get("default_dist")
#
# # Append the module lists:
# if not "modules" in config:
# config["modules"] = args["module"]
# else:
# config["modules"] += args["module"]
# print(f"Config modules is {config['modules']}")
# config.load_modules()
# # Now we can walk through the arguments, expanding them into a canonical form
# #
# # First, we go over simple args
# config["info"] = False
# config["ctext"] = args["text"]
# config["grep"] = args["greppable"]
# config["offline"] = args["offline"]
#
# # Verbosity levels
# if args["verbose"] >= 3:
# config["debug"] = "TRACE"
# config.update_log_level("TRACE")
# elif args["verbose"] == 2:
# config["debug"] = "DEBUG"
# config.update_log_level("DEBUG")
# elif args["verbose"] == 1:
# config["debug"] = "ERROR"
# config.update_log_level("ERROR")
# else:
# config["debug"] = "WARNING"
#
# if args["silent"]:
# config.update_log_level(None)
# config.grep = True
#
# # Try to locate language checker module
# # TODO: actually implement this
# from ciphey.LanguageChecker.brandon import ciphey_language_checker as brandon
#
# config["checker"] = brandon
# # Try to locate language checker module
# # TODO: actually implement this (should be similar)
# import cipheydists
#
# # Now we fill in the params *shudder*
# for i in args["param"]:
# key, value = i.split("=", 1)
# parent, name = key.split(".", 1)
# config.update_param(parent, name, value)
#
# # Now we have parsed and loaded everything else, we can load the objects
# config.load_objs()
#
# return args
#
def get_name(ctx, param, value):
# reads from stdin if the argument wasnt supplied
@ -323,105 +157,54 @@ def get_name(ctx, param, value):
return locals()
def arg_parsing(args) -> Optional[dict]:
"""This function parses arguments.
Args:
None
Returns:
The config to be passed around for the rest of time
"""
# the below text does:
# if -t is supplied, use that
# if ciphey is called like:
# ciphey 'encrypted text' use that
# else if data is piped like:
# echo 'hello' | ciphey use that
# if no data is supplied, no arguments supplied.
text = None
if args["text"] is not None:
text = args["text"]
else:
print("No input given.")
exit(1)
if len(sys.argv) == 1:
print("No arguments were supplied. Look at the help menu with -h or --help")
return None
args["text"] = text
if len(args["text"]) < 3:
print("A string of less than 3 chars cannot be interpreted by Ciphey.")
return None
config = dict()
# Now we can walk through the arguments, expanding them into a canonical form
#
# First, we go over simple args
config["info"] = False
config["ctext"] = args["text"]
config["grep"] = args["greppable"]
config["offline"] = args["offline"]
if args["verbose"] >= 3:
config["debug"] = "TRACE"
elif args["verbose"] == 2:
config["debug"] = "DEBUG"
elif args["verbose"] == 1:
config["debug"] = "ERROR"
else:
config["debug"] = "WARNING"
# Try to locate language checker module
# TODO: actually implement this
from ciphey.LanguageChecker.brandon import ciphey_language_checker as brandon
config["checker"] = brandon
# Try to locate language checker module
# TODO: actually implement this (should be similar)
import cipheydists
config["wordlist"] = set(cipheydists.get_list("english"))
# Now we fill in the params *shudder*
config["params"] = {}
return config
@click.command()
@click.option(
"-t", "--text", help="The ciphertext you want to decrypt.", type=str,
)
@click.option(
"-g",
"--greppable",
help="Only output the answer. Useful for grep.",
"-i",
"--info",
help="Do you want information on the cipher used?",
type=bool,
is_flag=True,
)
@click.option(
"-q",
"--quiet",
help="Decrease verbosity",
type=int,
count=True,
default=None
)
@click.option(
"-g",
"--greppable",
help="Only print the answer (useful for grep)",
type=bool,
is_flag=True,
default=None
)
@click.option("-v", "--verbose", count=True, type=int)
@click.option(
"-a",
"-C",
"--checker",
help="Use the default internal checker. Defaults to brandon",
type=bool,
help="Use the given checker",
default=None
)
@click.option(
"-A",
"--checker-path",
help="Uses the language checker at the given path",
type=click.Path(exists=True),
"-c",
"--config",
help="Uses the given config file. Defaults to appdirs.user_config_dir('ciphey', 'ciphey')/'config.yml'",
)
@click.option("-w", "--wordlist", help="Uses the given internal wordlist")
@click.option("-w", "--wordlist", help="Uses the given wordlist")
@click.option(
"-W",
"--wordlist-file",
help="Uses the wordlist at the given path",
type=click.File("rb"),
"-p",
"--param",
help="Passes a parameter to the language checker",
multiple=True,
)
@click.option(
"-p", "--param", help="Passes a parameter to the language checker", type=str
)
@click.option(
"-l", "--list-params", help="List the parameters of the selected module", type=str,
"-l", "--list-params", help="List the parameters of the selected module", type=bool,
)
@click.option(
"-O",
@ -430,23 +213,49 @@ def arg_parsing(args) -> Optional[dict]:
type=bool,
is_flag=True,
)
@click.option(
"--searcher",
help="Select the searching algorithm to use",
)
# HARLAN TODO XXX
# I switched this to a boolean flag system
# https://click.palletsprojects.com/en/7.x/options/#boolean-flags
# True for bytes input, False for str
@click.option(
"-b",
"--bytes-input",
help="Forces ciphey to use binary mode for the input. Rather experimental and may break things!",
is_flag=True,
default=None
)
# HARLAN TODO XXX
# I switched this to a boolean flag system
# https://click.palletsprojects.com/en/7.x/options/#boolean-flags
@click.option(
"-B",
"--bytes-output",
help="Forces ciphey to use binary mode for the output. Rather experimental and may break things!",
is_flag=True,
default=None
)
@click.option(
"--default-dist",
help="Sets the default character/byte distribution",
type=str,
default=None
)
@click.option(
"-m", "--module", help="Adds a module from the given path", type=click.Path(), multiple=True,
)
@click.option(
"-A",
"--appdirs",
help="Print the location of where Ciphey wants the settings file to be",
type=bool
)
@click.argument("text_stdin", callback=get_name, required=False)
@click.argument("file_stdin", type=click.File("rb"), required=False)
def main(
text,
greppable,
verbose,
checker,
checker_path,
wordlist,
wordlist_file,
param,
list_params,
offline,
text_stdin,
file_stdin,
config: Dict[str, object] = None,
) -> Optional[dict]:
def main(**kwargs) -> Optional[dict]:
"""Ciphey - Automated Decryption Tool
Documentation:
@ -459,7 +268,7 @@ def main(
Ciphey is an automated decryption tool using smart artificial intelligence and natural language processing. Input encrypted text, get the decrypted text back.
Examples:\n
Basic Usage: ciphey -t "aGVsbG8gbXkgbmFtZSBpcyBiZWU="
Basic Usage: ciphey -t "aGVsbG8gbXkgbmFtZSBpcyBiZWU=" -d true -c true
"""
@ -476,46 +285,90 @@ def main(
The output of the decryption.
"""
if config is None:
config = locals()
if config["text"] is None:
if file_stdin is not None:
config["text"] = file_stdin.read().decode("utf-8")
elif text_stdin is not None:
config["text"] = text_stdin
else:
print("No inputs were given to Ciphey. Run ciphey --help")
return None
# if user wants to know where appdirs is
# print and exit
if kwargs["appdirs"] is not None:
import appdirs
appname = "ciphey"
return None
config = arg_parsing(config)
# Check if we errored out
if config is None:
# Now we create the config object
config = iface.Config()
# Default init the config object
config = iface.Config()
# Load the settings file into the config
cfg_arg = kwargs["config"]
if cfg_arg is None:
# Make sure that the config dir actually exists
os.makedirs(iface.Config.get_default_dir(), exist_ok=True)
config.load_file(create=True)
else:
config.load_file(cfg_arg)
# Load the verbosity, so that we can start logging
verbosity = kwargs["verbose"]
quiet = kwargs["quiet"]
if verbosity is None:
if quiet is not None:
verbosity = -quiet
elif quiet is not None:
verbosity -= quiet
if kwargs["greppable"] is not None:
verbosity -= 999
# Use the existing value as a base
config.verbosity += verbosity
config.update_log_level(config.verbosity)
logger.trace(f"Got cmdline args {kwargs}")
# Now we load the modules
module_arg = kwargs["module"]
if module_arg is not None:
config.modules += list(module_arg)
config.load_modules()
# We need to load formats BEFORE we instantiate objects
if kwargs["bytes_input"] is not None:
config.update_format("in", "bytes")
output_format = kwargs["bytes_output"]
if kwargs["bytes_output"] is not None:
config.update_format("in", "bytes")
# Next, load the objects
params = kwargs["param"]
if params is not None:
for i in params:
key, value = i.split("=", 1)
parent, name = key.split(".", 1)
config.update_param(parent, name, value)
config.update("checker", kwargs["checker"])
config.update("searcher", kwargs["searcher"])
config.update("default_dist", kwargs["default_dist"])
config.load_objs()
logger.trace(f"Config finalised: {config}")
# Finally, we load the plaintext
if kwargs["text"] is None:
if kwargs["file_stdin"] is not None:
kwargs["text"] = kwargs["file_stdin"].read().decode("utf-8")
elif kwargs["text_stdin"] is not None:
kwargs["text"] = kwargs["text_stdin"]
else:
print("No inputs were given to Ciphey. For usage, run ciphey --help")
logger.critical("No text input given!")
return None
return main_decrypt(config)
# Now we have working arguments, we can expand it and pass it to the Ciphey constructor
def main_decrypt(config: Dict[str, object] = None) -> Optional[dict]:
"""Calls the decrypt, acts as a 2nd main
The problem is that Click fails to run when importing and using main()
If I make a new function for Click, I have to change so much just to make it work.
If I make a new function for using the default config, and acting as a 2nd main -- I have to change less
Thus, this function exists."""
if config is None:
print("No config file.")
exit(1)
cipher_obj = Ciphey(config)
return cipher_obj.decrypt()
print(decrypt(config, kwargs["text"]))
if __name__ == "__main__":
# withArgs because this function is only called
# if the program is run in terminal
main()
#with click_spinner.spinner():
# result = main()
result = main()
if result is not None:
print(result)

View File

@ -0,0 +1 @@
from . import quorum, regex, brandon

View File

@ -0,0 +1,313 @@
"""
© Brandon Skerritt
Github: brandonskerritt
Class to determine whether somethine is English or not.
1. Calculate the Chi Squared score of a sentence
2. If the score is significantly lower than the average score, it _might_ be English
2.1. If the score _might_ be English, then take the text and compare it to the sorted dictionary
in O(n log n) time.
It creates a percentage of "How much of this text is in the dictionary?"
The dictionary contains:
* 20,000 most common US words
* 10,000 most common UK words (there's no repition between the two)
* The top 10,000 passwords
If the word "Looks like" English (chi-squared) and if it contains English words, we can conclude it is
very likely English. The alternative is doing the dictionary thing but with an entire 479k word dictionary (slower)
2.2. If the score is not English, but we haven't tested enough to create an average, then test it against
the dictionary
Things to optimise:
* We only run the dictionary if it's 20% smaller than the average for chi squared
* We consider it "English" if 45% of the text matches the dictionary
* We run the dictionary if there is less than 10 total chisquared test
How to add a language:
* Download your desired dictionary. Try to make it the most popular words, for example. Place this file into this
folder with languagename.txt
As an example, this comes built in with english.txt
Find the statistical frequency of each letter in that language.
For English, we have:
self.languages = {
"English":
[0.0855, 0.0160, 0.0316, 0.0387, 0.1210,0.0218, 0.0209, 0.0496, 0.0733, 0.0022,0.0081, 0.0421, 0.0253, 0.0717,
0.0747,0.0207, 0.0010, 0.0633, 0.0673, 0.0894,0.0268, 0.0106, 0.0183, 0.0019, 0.0172,0.0011]
}
In chisquared.py
To add your language, do:
self.languages = {
"English":
[0.0855, 0.0160, 0.0316, 0.0387, 0.1210,0.0218, 0.0209, 0.0496, 0.0733, 0.0022,0.0081, 0.0421, 0.0253, 0.0717,
0.0747,0.0207, 0.0010, 0.0633, 0.0673, 0.0894,0.0268, 0.0106, 0.0183, 0.0019, 0.0172,0.0011]
"German": [0.0973]
}
In alphabetical order
And you're.... Done! Make sure the name of the two match up
"""
from typing import Dict, Set, Optional, Any
import ciphey
from string import punctuation
from loguru import logger
import string
import os
import sys
from loguru import logger
from math import ceil
from ciphey.iface import T, registry
sys.path.append("..")
try:
import mathsHelper as mh
except ModuleNotFoundError:
import ciphey.mathsHelper as mh
@registry.register
class Brandon(ciphey.iface.Checker[str]):
"""
Class designed to confirm whether something is **language** based on how many words of **language** appears
Call confirmLanguage(text, language)
* text: the text you want to confirm
* language: the language you want to confirm
Find out what language it is by using chisquared.py, the highest chisquared score is the language
languageThreshold = 45
if a string is 45% **language** words, then it's confirmed to be english
"""
def getExpectedRuntime(self, text: T) -> float:
# TODO: actually work this out
return 1e-4 # 100 µs
wordlist: set
def clean_text(self, text: str) -> set:
"""Cleans the text ready to be checked
Strips punctuation, makes it lower case, turns it into a set separated by spaces, removes duplicate words
Args:
text -> The text we use to perform analysis on
Returns:
text -> the text as a list, now cleaned
"""
# makes the text unique words and readable
text = text.lower()
text = self.mh.strip_puncuation(text)
text = text.split(" ")
text = set(text)
return text
x = []
for word in text:
# poor mans lemisation
# removes 's from the dict'
if word.endswith("'s"):
x.append(word[0:-2])
text = self.mh.strip_puncuation(x)
# turns it all into lowercase and as a set
complete = set([word.lower() for word in x])
return complete
def checker(self, text: str, threshold: float, text_length: int, var: set) -> bool:
"""Given text determine if it passes checker
The checker uses the vairable passed to it. I.E. Stopwords list, 1k words, dictionary
Args:
text -> The text to check
threshold -> at what point do we return True? The percentage of text that is in var before we return True
text_length -> the length of the text
var -> the variable we are checking against. Stopwords list, 1k words list, dictionray list.
Returns:
boolean -> True for it passes the test, False for it fails the test."""
if text is None:
logger.trace(f"Checker's text is None, so returning False")
return False
if var is None:
logger.trace(f"Checker's input var is None, so returning False")
return False
percent = ceil(text_length * threshold)
logger.trace(f"Checker's chunks are size {percent}")
meet_threshold = 0
location = 0
end = percent
while location <= text_length:
# chunks the text, so only gets THRESHOLD chunks of text at a time
text = list(text)
to_analyse = text[location:end]
logger.trace(f"To analyse is {to_analyse}")
for word in to_analyse:
# if word is a stopword, + 1 to the counter
if word in var:
logger.trace(
f"{word} is in var, which means I am +=1 to the meet_threshold which is {meet_threshold}"
)
meet_threshold += 1
meet_threshold_percent = meet_threshold / text_length
if meet_threshold_percent >= threshold:
logger.trace(
f"Returning true since the percentage is {meet_threshold / text_length} and the threshold is {threshold}"
)
# if we meet the threshold, return True
# otherwise, go over again until we do
# We do this in the for loop because if we're at 24% and THRESHOLD is 25
# we don't want to wait THRESHOLD to return true, we want to return True ASAP
return True
location = end
end = end + percent
logger.trace(
f"The language proportion {meet_threshold_percent} is under the threshold {threshold}"
)
return False
def __init__(self, config: ciphey.iface.Config):
# Suppresses warning
super().__init__(config)
self.mh = mh.mathsHelper()
phases = config.get_resource(self._params()["phases"])
self.thresholds_phase1 = phases["1"]
self.thresholds_phase2 = phases["2"]
self.top1000Words = config.get_resource(self._params().get("top1000"))
self.wordlist = config.get_resource(self._params()["wordlist"])
self.stopwords = config.get_resource(self._params().get("stopwords"))
self.len_phase1 = len(self.thresholds_phase1)
self.len_phase2 = len(self.thresholds_phase2)
def check(self, text: str) -> Optional[str]:
"""Checks to see if the text is in English
Performs a decryption, but mainly parses the internal data packet and prints useful information.
Args:
text -> The text we use to perform analysis on
Returns:
bool -> True if the text is English, False otherwise.
"""
logger.trace(f'In Language Checker with "{text}"')
text = self.clean_text(text)
logger.trace(f'Text split to "{text}"')
if text == "":
return None
length_text = len(text)
# "Phase 1": {0: {"check": 0.02}, 110: {"stop": 0.15}, 150: {"stop": 0.28}}
# Phase 1 checking
what_to_use = {}
# this code decides what checker / threshold to use
# if text is over or equal to maximum size, just use the maximum possible checker
what_to_use = self.calculateWhatChecker(
length_text, self.thresholds_phase1.keys()
)
logger.trace(f"What to use is {what_to_use}")
logger.trace(self.thresholds_phase1)
what_to_use = self.thresholds_phase1[str(what_to_use)]
# def checker(self, text: str, threshold: float, text_length: int, var: set) -> bool:
if "check" in what_to_use:
# perform check 1k words
result = self.checker(
text, what_to_use["check"], length_text, self.top1000Words
)
logger.trace(f"The result from check 1k words is {result}")
elif "stop" in what_to_use:
# perform stopwords
result = self.checker(
text, what_to_use["stop"], length_text, self.stopwords
)
logger.trace(f"The result from check stopwords is {result}")
else:
logger.debug(f"It is neither stop or check, but instead {what_to_use}")
# return False if phase 1 fails
if not result:
return None
else:
what_to_use = self.calculateWhatChecker(
length_text, self.thresholds_phase2.keys()
)
what_to_use = self.thresholds_phase2[str(what_to_use)]
result = self.checker(
text, what_to_use["dict"], length_text, self.wordlist
)
logger.trace(f"Result of dictionary checker is {result}")
return "" if result else None
def calculateWhatChecker(self, length_text, key):
"""Calculates what threshold / checker to use
If the length of the text is over the maximum sentence length, use the last checker / threshold
Otherwise, traverse the keys backwards until we find a key range that does not fit.
So we traverse backwards and see if the sentence length is between current - 1 and current
In this way, we find the absolute lowest checker / percentage threshold.
We traverse backwards because if the text is longer than the max sentence length, we already know.
In total, the keys are only 5 items long or so. It is not expensive to move backwards, nor is it expensive to move forwards.
Args:
length_text -> The length of the text
key -> What key we want to use. I.E. Phase1 keys, Phase2 keys.
Returns:
what_to_use -> the key of the lowest checker."""
_keys = list(key)
_keys = list(map(int, _keys))
if length_text >= int(_keys[-1]):
what_to_use = key[_keys[-1]]
else:
# this algorithm finds the smallest possible fit for the text
for counter, i in reversed(list(enumerate(_keys))):
if counter != 0:
if _keys[counter - 1] <= length_text <= i:
what_to_use = i
return what_to_use
@staticmethod
def getParams() -> Optional[Dict[str, ciphey.iface.ParamSpec]]:
return {
"top1000": ciphey.iface.ParamSpec(
desc="A wordlist of the top 1000 words",
req=False,
default="cipheydists::list::english1000",
),
"wordlist": ciphey.iface.ParamSpec(
desc="A wordlist of all the words",
req=False,
default="cipheydists::list::english",
),
"stopwords": ciphey.iface.ParamSpec(
desc="A wordlist of StopWords",
req=False,
default="cipheydists::list::englishStopWords",
),
"threshold": ciphey.iface.ParamSpec(
desc="The minimum proportion (between 0 and 1) that must be in the dictionary",
req=False,
default=0.45,
),
"phases": ciphey.iface.ParamSpec(
desc="Language-specific phase thresholds",
req=False,
default="cipheydists::brandon::english",
),
}

View File

@ -0,0 +1,49 @@
from math import ceil
from typing import Optional, Dict, Generic
import ciphey
from ciphey.iface import ParamSpec, Config, T
class Quorum(Generic[T], ciphey.iface.Checker[T]):
def check(self, text: T) -> Optional[str]:
left = self._params().k
results = []
for checker in self.checkers:
results.append(checker.check(text))
if results[-1] is None:
continue
left -= 1
# Early return check
if left == 0:
return str(results)
def __init__(self, config: Config):
super().__init__(config)
if self._params().k is None:
k = len(self._params()["checker"])
# These checks need to be separate, to make sure that we do not have zero members
if self._params().k == 0 or self._params().k > len(self._params()["checker"]):
raise IndexError(
"k must be between 0 and the number of checkers (inclusive)"
)
self.checkers = []
for i in self._params()["checker"]:
# This enforces type consistency
self.checkers.append(
ciphey.iface._registry.get_named(i, ciphey.iface.Checker[T])
)
@staticmethod
def getParams() -> Optional[Dict[str, ParamSpec]]:
return {
"checker": ParamSpec(
req=True, desc="The checkers to be used for analysis", list=True
),
"k": ParamSpec(
req=False,
desc="The minimum quorum size. Defaults to the number of checkers",
),
}

View File

@ -0,0 +1,40 @@
from typing import Optional, Dict
import ciphey
import re
from ciphey.iface import ParamSpec, T, Config, registry
from loguru import logger
@registry.register
class Regex(ciphey.iface.Checker[str]):
def getExpectedRuntime(self, text: T) -> float:
return 1e-5 # TODO: actually calculate this
def __init__(self, config: Config):
super().__init__(config)
self.regexes = list(map(re.compile, self._params()["regex"]))
logger.trace(f"There are {len(self.regexes)} regexes")
def check(self, text: str) -> Optional[str]:
for regex in self.regexes:
logger.trace(f"Trying regex {regex} on {text}")
res = regex.search(text)
logger.trace(f"Results: {res}")
if res:
return f"passed with regex {regex}"
@staticmethod
def getParams() -> Optional[Dict[str, ParamSpec]]:
return {
"regex": ParamSpec(
req=True,
desc="The regex that must be matched (in a substring)",
list=True,
)
}
@staticmethod
def getName() -> str:
return "regex"

View File

@ -0,0 +1 @@
from . import caesar, vigenere

View File

@ -0,0 +1,112 @@
"""
© Brandon Skerritt
Github: brandonskerritt
"""
from distutils import util
from typing import Optional, Dict, Union, Set, List
from loguru import logger
import ciphey
import cipheycore
from ciphey.iface import ParamSpec, CrackResult, T, CrackInfo, registry
@registry.register
class Caesar(ciphey.iface.Cracker[str]):
def getInfo(self, ctext: T) -> CrackInfo:
analysis = self.cache.get_or_update(
ctext,
"cipheycore::simple_analysis",
lambda: cipheycore.analyse_string(ctext),
)
return CrackInfo(
success_likelihood=cipheycore.caesar_detect(analysis, self.expected),
# TODO: actually calculate runtimes
success_runtime=1e-4,
failure_runtime=1e-4,
)
@staticmethod
def getTarget() -> str:
return "caesar"
def attemptCrack(self, ctext: str) -> List[CrackResult]:
logger.debug("Trying caesar cipher")
# Convert it to lower case
#
# TODO: handle different alphabets
if self.lower:
message = ctext.lower()
else:
message = ctext
logger.trace("Beginning cipheycore simple analysis")
# Hand it off to the core
analysis = self.cache.get_or_update(
ctext,
"cipheycore::simple_analysis",
lambda: cipheycore.analyse_string(message),
)
logger.trace("Beginning cipheycore::caesar")
possible_keys = cipheycore.caesar_crack(
analysis, self.expected, self.group, True, self.p_value
)
n_candidates = len(possible_keys)
logger.debug(f"Caesar returned {n_candidates} candidates")
candidates = []
for candidate in possible_keys:
translated = cipheycore.caesar_decrypt(message, candidate.key, self.group)
candidates.append(CrackResult(value=translated, key_info=candidate.key))
return candidates
@staticmethod
def getParams() -> Optional[Dict[str, ParamSpec]]:
return {
"expected": ciphey.iface.ParamSpec(
desc="The expected distribution of the plaintext",
req=False,
config_ref=["default_dist"],
),
"group": ciphey.iface.ParamSpec(
desc="An ordered sequence of chars that make up the caesar cipher alphabet",
req=False,
default="abcdefghijklmnopqrstuvwxyz",
),
"lower": ciphey.iface.ParamSpec(
desc="Whether or not the ciphertext should be converted to lowercase first",
req=False,
default=True,
),
"p_value": ciphey.iface.ParamSpec(
desc="The p-value to use for standard frequency analysis",
req=False,
default=0.1,
)
# TODO: add "filter" param
}
@staticmethod
def scoreUtility() -> float:
return 1.5
def __init__(self, config: ciphey.iface.Config):
super().__init__(config)
self.lower: Union[str, bool] = self._params()["lower"]
if type(self.lower) != bool:
self.lower = util.strtobool(self.lower)
self.group = list(self._params()["group"])
self.expected = config.get_resource(self._params()["expected"])
self.cache = config.cache
self.p_value = self._params()["p_value"]

View File

@ -0,0 +1,251 @@
"""
© Brandon Skerritt
Github: brandonskerritt
"""
from distutils import util
from typing import Optional, Dict, Union, Set, List
import re
from loguru import logger
import ciphey
import cipheycore
from ciphey.iface import ParamSpec, Cracker, CrackResult, T, CrackInfo, registry
@registry.register
class Vigenere(ciphey.iface.Cracker[str]):
def getInfo(self, ctext: T) -> CrackInfo:
if self.keysize is not None:
analysis = self.cache.get_or_update(
ctext,
f"vigenere::{self.keysize}",
lambda: cipheycore.analyse_string(ctext, self.keysize, self.group),
)
return CrackInfo(
success_likelihood=cipheycore.vigenere_detect(analysis, self.expected),
# TODO: actually calculate runtimes
success_runtime=1e-4,
failure_runtime=1e-4,
)
else:
return CrackInfo(
success_likelihood=0.5, # TODO: actually work this out
# TODO: actually calculate runtimes
success_runtime=1e-4,
failure_runtime=1e-4,
)
@staticmethod
def getTarget() -> str:
return "vigenere"
def crackOne(
self, ctext: str, analysis: cipheycore.windowed_analysis_res
) -> List[CrackResult]:
possible_keys = cipheycore.vigenere_crack(
analysis, self.expected, self.group, self.p_value
)
return [
CrackResult(
value=cipheycore.vigenere_decrypt(ctext, candidate.key, self.group),
key_info="".join([self.group[i] for i in candidate.key]),
)
for candidate in possible_keys
]
def attemptCrack(self, ctext: str) -> List[CrackResult]:
logger.debug("Trying vigenere cipher")
# Convert it to lower case
if self.lower:
message = ctext.lower()
else:
message = ctext
# Analysis must be done here, where we know the case for the cache
if self.keysize is not None:
return self.crackOne(
message,
self.cache.get_or_update(
ctext,
f"vigenere::{self.keysize}",
lambda: cipheycore.analyse_string(ctext, self.keysize, self.group),
),
)
else:
arrs = []
possible_len = self.kasiskiExamination(message)
possible_len.sort()
logger.trace(f"Got possible lengths {possible_len}")
# TODO: work out length
for i in possible_len:
arrs.extend(
self.crackOne(
message,
self.cache.get_or_update(
ctext,
f"vigenere::{i}",
lambda: cipheycore.analyse_string(ctext, i, self.group),
),
)
)
logger.debug(f"Vigenere returned {len(arrs)} candidates")
return arrs
@staticmethod
def getParams() -> Optional[Dict[str, ParamSpec]]:
return {
"expected": ciphey.iface.ParamSpec(
desc="The expected distribution of the plaintext",
req=False,
config_ref=["default_dist"],
),
"group": ciphey.iface.ParamSpec(
desc="An ordered sequence of chars that make up the caesar cipher alphabet",
req=False,
default="abcdefghijklmnopqrstuvwxyz",
),
"lower": ciphey.iface.ParamSpec(
desc="Whether or not the ciphertext should be converted to lowercase first",
req=False,
default=True,
),
"keysize": ciphey.iface.ParamSpec(
desc="A key size that should be used. If not given, will attempt to work it out",
req=False,
),
"p_value": ciphey.iface.ParamSpec(
desc="The p-value to use for windowed frequency analysis",
req=False,
default=0.99,
),
}
def __init__(self, config: ciphey.iface.Config):
super().__init__(config)
self.lower: Union[str, bool] = self._params()["lower"]
if type(self.lower) != bool:
self.lower = util.strtobool(self.lower)
self.group = list(self._params()["group"])
self.expected = config.get_resource(self._params()["expected"])
self.cache = config.cache
self.keysize = self._params().get("keysize")
if self.keysize is not None:
self.keysize = int(self.keysize)
self.MAX_KEY_LENGTH = 16 # Will not attempt keys longer than this.
self.p_value = self._params()["p_value"]
def kasiskiExamination(self, ciphertext) -> List[int]:
# Find out the sequences of 3 to 5 letters that occur multiple times
# in the ciphertext. repeatedSeqSpacings has a value like:
# {'EXG': [192], 'NAF': [339, 972, 633], ... }
repeatedSeqSpacings = self.findRepeatSequencesSpacings(ciphertext)
max = len(ciphertext) // 3
# (See getMostCommonFactors() for a description of seqFactors.)
seqFactors = {}
for seq in repeatedSeqSpacings:
seqFactors[seq] = []
for spacing in repeatedSeqSpacings[seq]:
seqFactors[seq].extend(self.getUsefulFactors(spacing, max))
# (See getMostCommonFactors() for a description of factorsByCount.)
factorsByCount = self.getMostCommonFactors(seqFactors)
# Now we extract the factor counts from factorsByCount and
# put them in allLikelyKeyLengths so that they are easier to
# use later:
allLikelyKeyLengths = []
for twoIntTuple in factorsByCount:
allLikelyKeyLengths.append(twoIntTuple[0])
return allLikelyKeyLengths
def findRepeatSequencesSpacings(self, message):
# Goes through the message and finds any 3 to 5 letter sequences
# that are repeated. Returns a dict with the keys of the sequence and
# values of a list of spacings (num of letters between the repeats).
# Use a regular expression to remove non-letters from the message:
# Compile a list of seqLen-letter sequences found in the message:
seqSpacings = {} # Keys are sequences, values are lists of int spacings.
for seqLen in range(3, 6):
for seqStart in range(len(message) - seqLen):
# Determine what the sequence is, and store it in seq:
seq = message[seqStart : seqStart + seqLen]
# Look for this sequence in the rest of the message:
for i in range(seqStart + seqLen, len(message) - seqLen):
if message[i : i + seqLen] == seq:
# Found a repeated sequence.
if seq not in seqSpacings:
seqSpacings[seq] = [] # Initialize a blank list.
# Append the spacing distance between the repeated
# sequence and the original sequence:
seqSpacings[seq].append(i - seqStart)
return seqSpacings
def getUsefulFactors(self, num, max: int):
# Returns a list of useful factors of num. By "useful" we mean factors
# less than MAX_KEY_LENGTH + 1 and not 1. For example,
# getUsefulFactors(144) returns [2, 3, 4, 6, 8, 9, 12, 16]
if num < 2:
return [] # Numbers less than 2 have no useful factors.
factors = set() # The list of factors found.
# When finding factors, you only need to check the integers up to
# MAX_KEY_LENGTH.
#
# Mathematician note: whilst this is *definitely* suboptimal,
# for small numbers it's probably as good as other methods
for i in range(
2, min(max, num)
): # Don't test 1: it's not useful.
if num % i == 0:
factors.add(i)
otherFactor = num // i
if otherFactor < self.MAX_KEY_LENGTH + 1 and otherFactor != 1:
factors.add(otherFactor)
return list(factors)
#
def getMostCommonFactors(self, seqFactors):
# First, get a count of how many times a factor occurs in seqFactors:
factorCounts = {} # Key is a factor, value is how often it occurs.
# seqFactors keys are sequences, values are lists of factors of the
# spacings. seqFactors has a value like: {'GFD': [2, 3, 4, 6, 9, 12,
# 18, 23, 36, 46, 69, 92, 138, 207], 'ALW': [2, 3, 4, 6, ...], ...}
for seq in seqFactors:
factorList = seqFactors[seq]
for factor in factorList:
if factor not in factorCounts:
factorCounts[factor] = 0
factorCounts[factor] += 1
# Second, put the factor and its count into a tuple, and make a list
# of these tuples so we can sort them:
factorsByCount = []
for factor in factorCounts:
# Exclude factors larger than MAX_KEY_LENGTH:
if factor <= self.MAX_KEY_LENGTH:
# factorsByCount is a list of tuples: (factor, factorCount)
# factorsByCount has a value like: [(3, 497), (2, 487), ...]
factorsByCount.append((factor, factorCounts[factor]))
# Sort the list by the factor count:
factorsByCount.sort(key=lambda x: x[1], reverse=True)
return factorsByCount

View File

@ -0,0 +1 @@
from . import morse, bases, unicode, reverse

View File

@ -0,0 +1,45 @@
import base64
import types
import ciphey
import binascii
from typing import Callable, Optional, Any, Dict
from loguru import logger
def _dispatch(self: Any, ctext: str, func: Callable[[str], bytes]) -> Optional[bytes]:
logger.trace(f"Attempting {self.getTarget()}")
try:
result = func(ctext)
logger.debug(f"{self.getTarget()} successful, returning {result}")
return result
except ValueError:
logger.trace(f"Failed to decode {self.getTarget()}")
return None
_bases = {
"base16": (base64.b16decode, 0.4),
"base32": (base64.b32decode, 0.01),
"base64": (base64.b64decode, 0.4),
"base85": (base64.b85decode, 0.01),
"ascii85": (base64.a85decode, 0.1),
}
def gen_class(name, decoder, priority, ns):
ns["_get_func"] = ciphey.common.id_lambda(decoder)
ns["decode"] = lambda self, ctext: _dispatch(self, ctext, self._get_func())
ns["getParams"] = ciphey.common.id_lambda(None)
ns["getTarget"] = ciphey.common.id_lambda(name)
ns["priority"] = ciphey.common.id_lambda(priority)
ns["__init__"] = lambda self, config: super(type(self), self).__init__(config)
for name, (decoder, priority) in _bases.items():
t = types.new_class(name, (ciphey.iface.Decoder[str, bytes],),
exec_body=lambda x: gen_class(name, decoder, priority, x))
ciphey.iface.registry.register(t)

View File

@ -0,0 +1,101 @@
from typing import Optional, Dict, Any, List
import re
from loguru import logger
import ciphey
from ciphey.iface import registry
@registry.register
class MorseCode(ciphey.iface.Decoder[str, str]):
# A priority list for char/word boundaries
BOUNDARIES = {" ": 1, "/": 2, "\n": 3, ".": -1, "-": -1}
MAX_PRIORITY = 3
ALLOWED = {".", "-", " ", "/", "\n"}
MORSE_CODE_DICT: Dict[str, str]
MORSE_CODE_DICT_INV: Dict[str, str]
@staticmethod
def getTarget() -> str:
return "morse"
def decode(self, text: str) -> Optional[str]:
# Trim end
while text[-1] in self.BOUNDARIES:
text = text[:-1]
logger.trace("Attempting morse code")
char_boundary = word_boundary = None
char_boundary = word_boundary = None
char_priority = word_priority = 0
# Custom loop allows early break
for i in text:
i_priority = self.BOUNDARIES.get(i)
if i_priority is None:
logger.trace(f"Non-morse char '{i}' found")
return None
if i_priority <= char_priority or i == char_boundary or i == word_boundary:
continue
# Default to having a char boundary over a word boundary
if (
i_priority > word_priority
and word_boundary is None
and char_boundary is not None
):
word_priority = i_priority
word_boundary = i
continue
char_priority = i_priority
char_boundary = i
logger.trace(
f"'Char boundary is '{char_boundary}', and word boundary is '{word_boundary}'"
)
result = ""
for word in text.split(word_boundary) if word_boundary else [text]:
logger.trace(f"Attempting to decode word {word}")
for char in word.split(char_boundary):
try:
m = self.MORSE_CODE_DICT_INV[char]
except KeyError:
logger.trace(f"Invalid codeword '{word}' found")
return None
result = result + m
# after every word add a space
result = result + " "
if len(result) == 0:
logger.trace(f"Morse code failed to match")
return None
# Remove trailing space
result = result[:-1]
logger.debug(f"Morse code successful, returning {result}")
return result.strip().upper()
@staticmethod
def getParams() -> Optional[Dict[str, ciphey.iface.ParamSpec]]:
return {
"dict": ciphey.iface.ParamSpec(
desc="The morse code dictionary to use",
req=False,
default="cipheydists::translate::morse",
)
}
@staticmethod
def getName() -> str:
return "morse"
@staticmethod
def priority() -> float:
return 0.05
def __init__(self, config: ciphey.iface.Config):
super().__init__(config)
self.MORSE_CODE_DICT = config.get_resource(
self._params()["dict"], ciphey.iface.WordList
)
self.MORSE_CODE_DICT_INV = {v: k for k, v in self.MORSE_CODE_DICT.items()}

View File

@ -0,0 +1,24 @@
from typing import Optional, Dict, List
from ciphey.iface import ParamSpec, Config, T, U, Decoder, registry
@registry.register_multi((str, str), (bytes, bytes))
class Reverse(Decoder):
def decode(self, ctext: T) -> Optional[U]:
return ctext[::-1]
@staticmethod
def priority() -> float:
return 0.05
def __init__(self, config: Config):
super().__init__(config)
@staticmethod
def getParams() -> Optional[Dict[str, ParamSpec]]:
pass
@staticmethod
def getTarget() -> str:
return "reverse"

View File

@ -0,0 +1,38 @@
from typing import Optional, Dict, Any
from loguru import logger
import ciphey
from ciphey.iface import registry
@registry.register
class Utf8(ciphey.iface.Decoder[bytes, str]):
@staticmethod
def getTarget() -> str:
return "utf8"
def decode(self, text: bytes) -> Optional[str]:
logger.trace("Attempting utf-8 decode")
try:
res = text.decode("utf8")
logger.debug(f"utf-8 decode gave '{res}'")
return res if len(res) != 0 else None
except UnicodeDecodeError:
logger.trace("utf-8 decode failed")
return None
@staticmethod
def getParams() -> Optional[Dict[str, Dict[str, Any]]]:
pass
@staticmethod
def getName() -> str:
return "UTF-8"
@staticmethod
def priority() -> float:
return 0.9
def __init__(self, config: ciphey.iface.Config):
super().__init__(config)

View File

@ -0,0 +1 @@
from . import cipheydists, files

View File

@ -0,0 +1,39 @@
from typing import Optional, Dict, Any, Set
from functools import lru_cache
import loguru
import ciphey
import cipheydists
from ciphey.iface import ParamSpec, Config, registry, WordList, Distribution
@registry.register_multi(WordList, Distribution)
class CipheyDists(ciphey.iface.ResourceLoader):
# _wordlists: Set[str] = frozenset({"english", "english1000", "englishStopWords"})
# _brandons: Set[str] = frozenset({"english"})
# _dists: Set[str] = frozenset({"twist"})
# _translates: Set[str] = frozenset({"morse"})
_getters = {
"list": cipheydists.get_list,
"dist": cipheydists.get_dist,
"brandon": cipheydists.get_brandon,
"translate": cipheydists.get_translate,
}
def whatResources(self) -> Optional[Set[str]]:
pass
@lru_cache
def getResource(self, name: str) -> Any:
loguru.logger.trace(f"Loading cipheydists resource {name}")
prefix, name = name.split("::", 1)
return self._getters[prefix](name)
def __init__(self, config: Config):
super().__init__(config)
@staticmethod
def getParams() -> Optional[Dict[str, ParamSpec]]:
pass

View File

@ -0,0 +1,65 @@
from abc import abstractmethod
from typing import Optional, Dict, Any, Set, Generic, Type
from functools import lru_cache
import ciphey
from ciphey.iface import T, ParamSpec, Config, get_args, registry
import json
import csv
# We can use a generic resource loader here, as we can instantiate it later
@registry.register_multi(ciphey.iface.WordList, ciphey.iface.Distribution)
class Json(ciphey.iface.ResourceLoader):
def whatResources(self) -> T:
return self._names
@lru_cache
def getResource(self, name: str) -> T:
prefix, name = name.split("::", 1)
return {"wordlist": (lambda js: {js}), "dist": (lambda js: js)}[prefix](
json.load(open(self._paths[int(name) - 1]))
)
@staticmethod
def getName() -> str:
return "json"
@staticmethod
def getParams() -> Optional[Dict[str, ciphey.iface.ParamSpec]]:
return {"path": ParamSpec(req=True, desc="The path to a JSON file", list=True)}
def __init__(self, config: ciphey.iface.Config):
super().__init__(config)
self._paths = self._params()["path"]
self._names = set(range(1, len(self._paths)))
# We can use a generic resource loader here, as we can instantiate it later
@registry.register_multi(ciphey.iface.WordList, ciphey.iface.Distribution)
class Csv(Generic[T], ciphey.iface.ResourceLoader[T]):
def whatResources(self) -> Set[str]:
return self._names
@lru_cache
def getResource(self, name: str) -> T:
prefix, name = name.split("::", 1)
return {
"wordlist": (lambda reader: {i[0] for i in reader}),
"dist": (lambda reader: {i[0]: float(i[1]) for i in reader}),
}[prefix](csv.reader(open(self._paths[int(name) - 1])))
@staticmethod
def getName() -> str:
return "csv"
@staticmethod
def getParams() -> Optional[Dict[str, ciphey.iface.ParamSpec]]:
return {"path": ParamSpec(req=True, desc="The path to a CSV file", list=True)}
def __init__(self, config: ciphey.iface.Config):
super().__init__(config)
self._paths = self._params()["path"]
self._names = set(range(1, len(self._paths)))

View File

@ -0,0 +1 @@
from . import ausearch, perfection

View File

@ -0,0 +1,183 @@
from collections import deque
import cipheycore
class Node:
"""
A node has a value assiocated with it
Calculated from the heuristic
"""
def __init__(
self, config, h: float = None, edges: (any, float) = None, ctext: str = None,
):
self.weight = h
# Edges is a list of other nodes it can connect to
self.edges = edges
self.ctext = ctext
self.h = h
self.path = []
self.information_content = config.cache.get_or_update(
self.text,
"cipheycore::info_content",
lambda: cipheycore.info_content(self.ctext),
)
def __le__(self, node2):
# if self is less than other
return self.x <= node2.x
def __lt__(self, node2):
return self.x < node2.x
def append_edge(self, edge):
self.edges.append(edge)
def get_edges(self):
return self.edges
class Graph:
# example of adjacency list (or rather map)
# adjacency_list = {
# 'A': [('B', 1), ('C', 3), ('D', 7)],
# 'B': [('D', 5)],
# 'C': [('D', 12)]
# }
def __init__(self, adjacency_list):
"""
adjacency list: basically the graph
"""
self.adjacency_list = adjacency_list
self.original_input = cipheycore.info_content(input)
def get_neighbors(self, v):
try:
return self.adjacency_list[v]
except KeyError:
# If we have exhausted the adjacency list
return []
# heuristic function with equal values for all nodes
def heuristic(self, n: Node):
return n.info_content / self.original_input
def a_star_algorithm(self, start_node: Node, stop_node: Node):
# TODO store the graph as an attribute
# open_list is a list of nodes which have been visited, but who's neighbors
# haven't all been inspected, starts off with the start node
# closed_list is a list of nodes which have been visited
# and who's neighbors have been inspected
open_list = set([start_node])
closed_list = set([])
# g contains current distances from start_node to all other nodes
# the default value (if it's not found in the map) is +infinity
g = {}
g[start_node] = 0
# parents contains an adjacency map of all nodes
parents = {}
parents[start_node] = start_node
while len(open_list) > 0:
print(f"The open list is {open_list}")
n = None
# find a node with the lowest value of f() - evaluation function
for v in open_list:
# TODO if v == decoder, run the decoder
print(f"The for loop node v is {v}")
if n == None or g[v] + self.h(v) < g[n] + self.h(n):
n = v
print(f"The value of n is {n}")
if n == None:
print("Path does not exist!")
return None
# if the current node is the stop_node
# then we begin reconstructin the path from it to the start_node
# NOTE Uncomment this for an exit condition
# TODO Make it exit if decryptor returns True
# TODO We need to append the decryption methods to each node
# So when we reconstruct the path we can reconstruct the decryptions
# used
if n == stop_node:
print("n is the stop node, we are stopping!")
reconst_path = []
while parents[n] != n:
reconst_path.append(n)
n = parents[n]
reconst_path.append(start_node)
reconst_path.reverse()
print("Path found: {}".format(reconst_path))
return reconst_path
print(n)
for (m, weight) in self.get_neighbors(n):
print(f"And the iteration is ({m}, {weight})")
# if the current node isn't in both open_list and closed_list
# add it to open_list and note n as it's parent
if m not in open_list and m not in closed_list:
open_list.add(m)
parents[m] = n
g[m] = g[n] + weight
# otherwise, check if it's quicker to first visit n, then m
# and if it is, update parent data and g data
# and if the node was in the closed_list, move it to open_list
else:
if g[m] > g[n] + weight:
g[m] = g[n] + weight
parents[m] = n
if m in closed_list:
closed_list.remove(m)
open_list.add(m)
# remove n from the open_list, and add it to closed_list
# because all of his neighbors were inspected
# open_list.remove(node)
# closed_list.add(node)
open_list.remove(n)
closed_list.add(n)
print("\n")
print("Path does not exist!")
return None
adjacency_list = {
"A": [("B", 1), ("C", 3), ("D", 7)],
"B": [("D", 5)],
"C": [("D", 12)],
}
A = Node(1)
B = Node(7)
C = Node(9)
D = Node(16)
A.edges = [(B, 1), (C, 3), (D, 7)]
B.edges = [(D, 5)]
C.edges = [(D, 12)]
# TODO use a dictionary comprehension to make this
adjacency_list = {
A: A.edges,
B: B.edges,
C: C.edges,
}
graph1 = Graph(adjacency_list)
graph1.a_star_algorithm(A, D)
"""
Maybe after it
"""

View File

@ -0,0 +1,174 @@
function reconstruct_path(cameFrom, current)
total_path := {current}
while current in cameFrom.Keys:
current := cameFrom[current]
total_path.prepend(current)
return total_path
// A* finds a path from start to goal.
// h is the heuristic function. h(n) estimates the cost to reach goal from node n.
function A_Star(graph, start, h)
// The set of discovered nodes that may need to be (re-)expanded.
// Initially, only the start node is known.
// This is usually implemented as a min-heap or priority queue rather than a hash-set.
openSet := {start}
// For node n, cameFrom[n] is the node immediately preceding it on the cheapest path from start
// to n currently known.
cameFrom := an empty map
// For node n, gScore[n] is the cost of the cheapest path from start to n currently known.
gScore := map with default value of Infinity
gScore[start] := 0
// For node n, fScore[n] := gScore[n] + h(n). fScore[n] represents our current best guess as to
// how short a path from start to finish can be if it goes through n.
fScore := map with default value of Infinity
fScore[start] := h(start)
// the exit condition is set to True when LC returns True
exit_condition = False
while not exit_condition
// This operation can occur in O(1) time if openSet is a min-heap or a priority queue
current := the node in openSet having the lowest fScore[] value
if current = goal
return reconstruct_path(cameFrom, current)
openSet.Remove(current)
for each neighbor of current
decodings = neighbor.decoders()
// d(current,neighbor) is the weight of the edge from current to neighbor
// tentative_gScore is the distance from start to the neighbor through current
tentative_gScore := gScore[current] + d(current, neighbor)
if tentative_gScore < gScore[neighbor]
// This path to neighbor is better than any previous one. Record it!
cameFrom[neighbor] := current
gScore[neighbor] := tentative_gScore
fScore[neighbor] := gScore[neighbor] + h(neighbor)
if neighbor not in openSet
openSet.add(neighbor)
# run the cracker on the object
crack(node.ctext)
if crack:
# if cracker returns true, reconstruct path and exiti
exit_condition = True
reconstruct(start, node)
else:
# else add the new children of the cracker to openSet
openSet.append(node: crack)
// Open set is empty but goal was never reached
return failure
function reconstruct_path(cameFrom, current)
total_path := {current}
while current in cameFrom.Keys:
current := cameFrom[current]
total_path.prepend(current)
return total_path
// A* finds a path from start to goal.
// h is the heuristic function. h(n) estimates the cost to reach goal from node n.
function A_Star(graph, start, h)
// The set of discovered nodes that may need to be (re-)expanded.
// Initially, only the start node is known.
// This is usually implemented as a min-heap or priority queue rather than a hash-set.
openSet := {start}
// For node n, cameFrom[n] is the node immediately preceding it on the cheapest path from start
// to n currently known.
cameFrom := an empty map
// For node n, gScore[n] is the cost of the cheapest path from start to n currently known.
gScore := map with default value of Infinity
gScore[start] := 0
// For node n, fScore[n] := gScore[n] + h(n). fScore[n] represents our current best guess as to
// how short a path from start to finish can be if it goes through n.
fScore := map with default value of Infinity
fScore[start] := h(start)
// the exit condition is set to True when LC returns True
exit_condition = False
while not exit_condition
// This operation can occur in O(1) time if openSet is a min-heap or a priority queue
current := the node in openSet having the lowest fScore[] value
if current = goal
return reconstruct_path(cameFrom, current)
openSet.Remove(current)
for each neighbor of current
decodings = neighbor.decoders()
// d(current,neighbor) is the weight of the edge from current to neighbor
// tentative_gScore is the distance from start to the neighbor through current
tentative_gScore := gScore[current] + d(current, neighbor)
if tentative_gScore < gScore[neighbor]
// This path to neighbor is better than any previous one. Record it!
cameFrom[neighbor] := current
gScore[neighbor] := tentative_gScore
fScore[neighbor] := gScore[neighbor] + h(neighbor)
if neighbor not in openSet
openSet.add(neighbor)
# run the cracker on the object
crack(node.ctext)
if crack:
# if cracker returns true, reconstruct path and exiti
exit_condition = True
reconstruct(start, node)
else:
# else add the new children of the cracker to openSet
openSet.append(node: crack)
// Open set is empty but goal was never reached
function calculate_new_children(node):
class Node:
"""
A node has a value assiocated with it
Calculated from the heuristic
"""
def __init__(self, h: float = None, edges: (any, float) = None, ctext: str = None):
self.weight = h
# Edges is a list of other nodes it can connect to
self.edges = edges
self.ctext = ctext
self.h = h
self.path = []
self.information_content = config.cache.get_or_update(
self.ctext,
"cipheycore::info_content",
lambda: cipheycore.info_content(self.ctext),
)
def __le__(self, node2):
# if self is less than other
return self.x <= node2.x
def __lt__(self, node2):
return self.x < node2.x
def append_edge(self, edge):
self.edges.append(edge)
def get_edges(self):
return self.edges

View File

@ -0,0 +1,183 @@
from abc import abstractmethod, ABC
from typing import Generic, List, Optional, Dict, Any, NamedTuple, Union, Set, Tuple
from ciphey.iface import (
T,
Cracker,
Config,
Searcher,
ParamSpec,
CrackInfo,
registry,
SearchLevel,
CrackResult,
SearchResult,
Decoder,
DecoderComparer,
)
from datetime import datetime
from loguru import logger
class Node(Generic[T], NamedTuple):
cracker: Cracker
parents: List[SearchLevel]
crack_info: CrackInfo
check_info: float
def __hash__(self):
return hash((type(self.cracker).__name__, len(self.parents)))
class AuSearch(Searcher, ABC):
@abstractmethod
def findBestNode(self, nodes: Set[Node]) -> Node:
pass
def handleDecodings(
self, target: Any
) -> (bool, Union[Tuple[SearchLevel, str], List[SearchLevel]]):
"""
If there exists a decoding that the checker returns true on, returns (True, result).
Otherwise, returns (False, names and successful decodings)
The CrackResult object should only have the value field filled in
MUST NOT recurse into decodings! evaluate does that for you!
"""
# This tag is necessary, as we could have a list as a decoding target, which would then screw over type checks
ret = []
decoders = []
for decoder_type, decoder_class in registry[Decoder][type(target)].items():
for decoder in decoder_class:
decoders.append(DecoderComparer(decoder))
# Fun fact:
# with Python's glorious lists, inserting n elements into the right position (with bisect) is O(n^2)
decoders.sort(reverse=True)
for decoder_cmp in decoders:
logger.trace(f"Inspecting {decoder_cmp}")
res = self._config()(decoder_cmp.value).decode(target)
if res is None:
continue
level = SearchLevel(
name=decoder_cmp.value.__name__.lower(),
result=CrackResult(value=res),
)
if type(res) == self._final_type:
check_res = self._checker(res)
if check_res is not None:
return True, (level, check_res)
ret.append(level)
return False, ret
def expand(
self, parents: List[SearchLevel], check: bool = True
) -> (bool, Union[SearchResult, List[Node]]):
result = parents[-1].result.value
# logger.debug(f"Expanding {parents}")
# Deduplication
if not self._config().cache.mark_ctext(result):
return False, []
if check and type(result) == self._final_type:
check_res = self._checker(result)
if check_res is not None:
return True, SearchResult(path=parents, check_res=check_res)
success, dec_res = self.handleDecodings(result)
if success:
return True, SearchResult(path=parents + [dec_res[0]], check_res=dec_res[1])
nodes: List[Node] = []
for decoding in dec_res:
# Don't check, as handleDecodings did that for us
success, eval_res = self.expand(parents + [decoding], check=False)
if success:
return True, eval_res
nodes.extend(eval_res)
crackers: List[Cracker] = registry[Cracker[type(result)]]
expected_time: float
# Worth doing this check twice to simplify code and allow a early return for decodings
if type(result) == self._final_type:
expected_time = self._checker.getExpectedRuntime(result)
else:
expected_time = 0
for i in crackers:
cracker = self._config()(i)
nodes.append(
Node(
cracker=cracker,
crack_info=cracker.getInfo(result),
check_info=expected_time,
parents=parents,
)
)
return False, nodes
def evaluate(self, node: Node) -> (bool, Union[List[SearchLevel], List[Node]]):
# logger.debug(f"Evaluating {node}")
res = node.cracker.attemptCrack(node.parents[-1].result.value)
# Detect if we succeeded, and if deduplication is needed
logger.trace(f"Got {len(res)} results")
ret = []
for i in res:
success, res = self.expand(
node.parents
+ [SearchLevel(name=type(node.cracker).__name__.lower(), result=i)]
)
if success:
return True, res
ret.extend(res)
return False, ret
def search(self, ctext: Any) -> List[SearchLevel]:
deadline = (
datetime.now() + self._config().objs["timeout"]
if self._config().timeout is not None
else datetime.max
)
success, expand_res = self.expand(
[SearchLevel(name="input", result=CrackResult(value=ctext))]
)
if success:
return expand_res
nodes = set(expand_res)
while datetime.now() < deadline:
# logger.trace(f"Have node tree {nodes}")
if len(nodes) == 0:
raise LookupError("Could not find any solutions")
best_node = self.findBestNode(nodes)
nodes.remove(best_node)
success, eval_res = self.evaluate(best_node)
if success:
# logger.trace(f"Success with node {best_node}")
return eval_res
nodes.update(eval_res)
raise TimeoutError("Search ran out of time")
@staticmethod
@abstractmethod
def getParams() -> Optional[Dict[str, ParamSpec]]:
pass
@abstractmethod
def __init__(self, config: Config):
super().__init__(config)
self._checker = config.objs["checker"]
self._final_type = config.objs["format"]["out"]

View File

@ -0,0 +1,233 @@
import heapq
class Imperfection:
"""The graph is a Node: [List of nodes]
Where each item in the list of nodes can also have a node with a list of nodes
Ths result is that we can keep track of edges, while also keeping it small
To calculate current, we push the entire graph to A*
And it calculates the next node to choose, as well as increasing the size
of the graph with values
We're using a heap, meaing the element at [0] is always the smallest element
So we choose that and return it.
The current A* implemnentation has an end, we simply do not let it end as LC will make it
end far before itreaches Searcher again.
Current is the start position, so if we say we always start at the start of the graph it'll
go through the entire graph
graph = {
Node: [
{Node :
{
node
}
}
]
}
For encodings we just do them straight out
The last value of parents from abstract
"""
"""
graph = {'A': ['B', 'C'],
'B': ['C', 'D'],
'C': ['D'],
'D': ['C'],
'E': ['F'],
'F': ['C']}"""
def __init__(self):
None
def findBestNode(nodes):
"""Finds the best decryption module"""
return next(iter(nodes))
# def aStar(self, graph, current, end):
# """The A* search algorithm
# We're using heaps to find the minimum element (the one that will be the next current)
# Heaps are like sets with O(1) lookup time, but maintain the lowest element as [0]
# Sets insert in O(1), heaps in O(log N).
# https://stackoverflow.com/questions/4159331/python-speed-up-an-a-star-pathfinding-algorithm
# Current appears to be the list of all new tiles we can reach from current location
# End is the end node, that won't actually run bc LC will make it return before it hits aSTar function
# so tbh I'll just make it infinitite unless something else forces a return
# The graph is the actual data structure used. According to StackOvervlow, it looks like this:
# graph = {'A': ['B', 'C'],
# 'B': ['C', 'D'],
# 'C': ['D'],
# 'D': ['C'],
# 'E': ['F'],
# 'F': ['C']}
# """
# # Runs decodings first
# openSet = set()
# openHeap = []
# closedSet = set()
# def retracePath(c):
# # Retraces a path back to the start
# path = [c]
# while c.parent is not None:
# c = c.parent
# path.append(c)
# path.reverse()
# return path
# # Adds the current location (start) to the heap and set
# openSet.add(current)
# openHeap.append((0, current))
# # while openSet contains items
# while openSet:
# # TODO change openSet to a heap?
# # gets the 2nd element from the first element of the heap
# # so the heap is (0, current)
# # which means we pop current
# # this makes me think that current isn't the first?
# current = heapq.heappop(openHeap)[1]
# # We don't actually want to end, so I'm commenting this:
# # XXX
# if current == end:
# return retracePath(current)
# # Removes it from todo and into done i think
# # closedSet appears to be the set of things we have done
# openSet.remove(current)
# closedSet.add(current)
# """
# Okay so our graph looks like this:
# graph = {
# Node: [
# {Node :
# {
# node
# }
# }
# ]
# }
# graph[current] **SHOULD** be the list of nodes which contains dictionaries of nodes
# """
# for tile in graph[current]:
# # ClosedSet appears to be the list of visited nodes
# # TODO place this as a class attribute
# if tile not in closedSet:
# # This is the heuristic
# # TODO expected_time/probability + k * heuristic, for some experimentally determined value of k
# tile.H = (abs(end.x - tile.x) + abs(end.y - tile.y)) * 10
# # if tile is not in the openSet, add it and then pop it from the heap
# if tile not in openSet:
# openSet.add(tile)
# heapq.heappush(openHeap, (tile.H, tile))
# # I have no idea where this code is called lol
# tile.parent = current
# # This returns Nothing
# # I need to modify it so it finds the best item from Current
# # So basically, return item 0 of openHeap
# # return openHeap[0]
# # Since the [0] item is always minimum
# return []
def aStar(self, graph, current, end):
print(f"The graph is {graph}\nCurrent is {current}\n and End is {end}")
openSet = set()
openHeap = []
closedSet = set()
def retracePath(c):
print("Calling retrace path")
path = [c]
while c.parent is not None:
c = c.parent
path.append(c)
path.reverse()
return path
print("\n")
openSet.add(current)
openHeap.append((0, current))
while openSet:
print(f"Openset is {openSet}")
print(f"OpenHeap is {openHeap}")
print(f"ClosedSet is {closedSet}")
print(f"Current is {current}")
print(f"I am popping {openHeap} with the first element")
current = heapq.heappop(openHeap)[1]
print(f"Current is now {current}")
print(f"Graph current is {graph[current]}")
if current == end:
return retracePath(current)
openSet.remove(current)
closedSet.add(current)
for tile in graph[current]:
if tile not in closedSet:
tile.H = (abs(end.x - tile.x) + abs(end.y - tile.y)) * 10
tile.H = 1
if tile not in openSet:
openSet.add(tile)
heapq.heappush(openHeap, (tile.H, tile))
tile.parent = current
print("\n")
return []
class Node:
"""
A node has a value assiocated with it
Calculated from the heuristic
"""
def __init__(self, h):
self.h = h
self.x = self.h
self.y = 0.6
def __le__(self, node2):
# if self is less than other
return self.x <= node2.x
def __lt__(self, node2):
return self.x < node2.x
if __name__ == "__main__":
obj = Imperfection()
graph = {
"A": ["B", "C"],
"B": ["C", "D"],
"C": ["D"],
"D": ["C"],
"E": ["F"],
"F": ["C"],
}
# Makes the graph
y = Node(0.5)
x = Node(0.3)
p = Node(0.7)
q = Node(0.9)
graph = {y: [x, p], p: q}
print(obj.aStar(graph, y, q))

View File

@ -0,0 +1,31 @@
from abc import abstractmethod
from typing import Set, Any, Union, List, Optional, Dict, Tuple
from loguru import logger
from .ausearch import Node, AuSearch
from ciphey.iface import (
SearchLevel,
Config,
registry,
CrackResult,
Searcher,
ParamSpec,
Decoder,
DecoderComparer,
)
import bisect
@registry.register
class Perfection(AuSearch):
@staticmethod
def getParams() -> Optional[Dict[str, ParamSpec]]:
pass
def findBestNode(self, nodes: Set[Node]) -> Node:
return next(iter(nodes))
def __init__(self, config: Config):
super().__init__(config)

View File

@ -0,0 +1 @@
from . import Checkers, Crackers, Decoders, Resources, Searchers

20
ciphey/common.py Normal file
View File

@ -0,0 +1,20 @@
"""Some useful adapters"""
from typing import Any
import cipheycore
def id_lambda(value: Any):
"""
A function used in dynamic class generation that abstracts away a constant return value (like in getName)
"""
return lambda *args: value
def cached_freq_analysis(ctext, config):
base = config.objs.setdefault("cached_freq_analysis", ctext)
res = base.get("cached_freq_analysis")
if res is not None:
return res
base["cached_freq_analysis"] = cipheycore.analyse_string(ctext)

17
ciphey/iface/__init__.py Normal file
View File

@ -0,0 +1,17 @@
from ._config import Config
from ._modules import \
Decoder, DecoderComparer, \
Cracker, CrackResult, CrackInfo, \
Checker, \
Searcher, SearchResult, SearchLevel, \
ResourceLoader, \
ParamSpec, \
WordList, Distribution, \
T, U, \
pretty_search_results
from . import _registry
from ._registry import get_args, get_origin
from ._fwd import registry

203
ciphey/iface/_config.py Normal file
View File

@ -0,0 +1,203 @@
import os
from abc import ABC, abstractmethod
from typing import (
Any,
Dict,
Optional,
List,
Type,
Union, Callable,
)
import pydoc
from loguru import logger
import datetime
import yaml
import appdirs
from . import _fwd
from ._modules import Checker, Searcher, ResourceLoader
class Cache:
"""Used to track state between levels of recursion to stop infinite loops, and to optimise repeating actions"""
_cache: Dict[Any, Dict[str, Any]] = {}
def mark_ctext(self, ctext: Any) -> bool:
if (type(ctext) == str or type(ctext) == bytes) and len(ctext) < 4:
logger.trace(f"Candidate {ctext} too short!")
return False
if ctext in self._cache:
logger.trace(f"Deduped {ctext}")
return False
self._cache[ctext] = {}
return True
def get_or_update(self, ctext: Any, keyname: str, get_value: Callable[[], Any]):
# Should have been marked first
target = self._cache[ctext]
res = target.get(keyname)
if res is not None:
return res
val = get_value()
target[keyname] = val
return val
def split_resource_name(full_name: str) -> (str, str):
return full_name.split("::", 1)
class Config:
verbosity: int = 0
searcher: str = "perfection"
params: Dict[str, Dict[str, Union[str, List[str]]]] = {}
format: Dict[str, str] = {"in": "str", "out": "str"}
modules: List[str] = []
checker: str = "brandon"
default_dist: str = "cipheydists::dist::twist"
timeout: Optional[int] = None
_inst: Dict[type, Any] = {}
objs: Dict[str, Any] = {}
cache: Cache = Cache()
@staticmethod
def get_default_dir() -> str:
return appdirs.user_config_dir("ciphey")
def merge_dict(self, config_file: Optional[Dict[str, Any]]):
if config_file is None:
return
for a, b in config_file.items():
self.update(a, b)
def load_file(self, path: str = os.path.join(get_default_dir.__func__(), "config.yml"), create=False):
try:
with open(path, "r+") as file:
return self.merge_dict(yaml.safe_load(file))
except FileNotFoundError:
if create:
open(path, "w+")
def instantiate(self, t: type) -> Any:
"""
Used to enable caching of a instantiated type after the configuration has settled
"""
# We cannot use set default as that would construct it again, and throw away the result
res = self._inst.get(t)
if res is not None:
return res
ret = t(self)
self._inst[t] = ret
return ret
def __call__(self, t: type) -> Any:
return self.instantiate(t)
def update(self, attrname: str, value: Optional[Any]):
if value is not None:
setattr(self, attrname, value)
def update_param(self, owner: str, name: str, value: Optional[Any]):
if value is None:
return
target = self.params.setdefault(owner, {})
if _fwd.registry.get_named(owner).getParams()[name].list:
target.setdefault(name, []).append(value)
else:
target[name] = value
def update_format(self, paramname: str, value: Optional[Any]):
if value is not None:
self.format[paramname] = value
def load_objs(self):
# Basic type conversion
if self.timeout is not None:
self.objs["timeout"] = datetime.timedelta(seconds=int(self.timeout))
self.objs["format"] = {
key: pydoc.locate(value) for key, value in self.format.items()
}
# Checkers do not depend on anything
self.objs["checker"] = self(_fwd.registry.get_named(self.checker, Checker))
# Searchers only depend on checkers
self.objs["searcher"] = self(_fwd.registry.get_named(self.searcher, Searcher))
def update_log_level(self, verbosity: Optional[int]):
if verbosity is None:
return
self.verbosity = verbosity
quiet_list = [
"ERROR",
"CRITICAL",
]
loud_list = [
"DEBUG",
"TRACE"
]
verbosity_name: str
if verbosity == 0:
verbosity_name = "WARNING"
elif verbosity >= 0:
verbosity_name = loud_list[min(len(loud_list), verbosity) - 1]
else:
verbosity_name = quiet_list[min(len(quiet_list), -verbosity) - 1]
from loguru import logger
import sys
logger.remove()
if self.verbosity is None:
return
logger.configure()
if self.verbosity > 0:
logger.add(sink=sys.stderr, level=verbosity_name, colorize=sys.stderr.isatty())
logger.opt(colors=True)
else:
logger.add(
sink=sys.stderr, level=verbosity_name, colorize=False, format="{message}"
)
logger.debug(f"Verbosity set to level {verbosity} ({verbosity_name})")
def load_modules(self):
import importlib.util
for i in self.modules:
spec = importlib.util.spec_from_file_location("ciphey.module_load_site", i)
mod = importlib.util.module_from_spec(spec)
spec.loader.exec_module(mod)
def get_resource(self, res_name: str, t: Optional[Type] = None) -> Any:
logger.trace(f"Loading resource {res_name} of type {t}")
# FIXME: Actually returns obj of type `t`, but python is bad
loader, name = split_resource_name(res_name)
if t is None:
return self(_fwd.registry.get_named(loader, ResourceLoader))(name)
else:
return self(_fwd.registry.get_named(loader, ResourceLoader[t]))(name)
def __str__(self):
return str({
"verbosity": self.verbosity,
"searcher": self.searcher,
"params": self.params,
"format": self.format,
"modules": self.modules,
"checker": self.checker,
"default_dist": self.default_dist,
"timeout": self.timeout
})
_fwd.config = Config

2
ciphey/iface/_fwd.py Normal file
View File

@ -0,0 +1,2 @@
registry = None
config = type(None)

304
ciphey/iface/_modules.py Normal file
View File

@ -0,0 +1,304 @@
from abc import ABC, abstractmethod
from typing import (
Any,
Callable,
Dict,
Generic,
Optional,
List,
NamedTuple,
TypeVar,
Type,
Union,
Set,
)
import pydoc
from loguru import logger
import datetime
from ._fwd import config as Config
T = TypeVar("T")
U = TypeVar("U")
class ParamSpec(NamedTuple):
"""
Attributes:
req Whether this argument is required
desc A description of what this argument does
default The default value for this argument. Ignored if req == True or configPath is not None
config_ref The path to the config that should be the default value
list Whether this parameter is in the form of a list, and can therefore be specified more than once
visible Whether the user can tweak this via the command line
"""
req: bool
desc: str
default: Optional[Any] = None
list: bool = False
config_ref: Optional[List[str]] = None
visible: bool = False
class ConfigurableModule(ABC):
@staticmethod
@abstractmethod
def getParams() -> Optional[Dict[str, ParamSpec]]:
"""
Returns a dictionary of `argument name: argument specification`
"""
pass
def _checkParams(self):
"""
Fills the given params dict with default values where arguments are not given,
using None as the default value for default values
"""
params = self._params()
config = self._config()
for key, value in self.getParams().items():
# If we already have it, then we don't need to do anything
if key in params:
continue
# If we don't have it, but it's required, then fail
if value.req:
raise KeyError(
f"Missing required param {key} for {type(self).__name__.lower()}"
)
# If it's a reference by default, fill that in
if value.config_ref is not None:
tmp = getattr(config, value.config_ref[0])
params[key] = (
tmp[value.config_ref[1:]] if len(value.config_ref) > 1 else tmp
)
# Otherwise, put in the default value (if it exists)
elif value.default is not None:
params[key] = value.default
def _params(self):
return self._params_obj
def _config(self):
return self._config_obj
@abstractmethod
def __init__(self, config: Config):
self._config_obj = config
if self.getParams() is not None:
self._params_obj = config.params.setdefault(type(self).__name__.lower(), {})
self._checkParams()
class Targeted(ABC):
@staticmethod
@abstractmethod
def getTarget() -> str:
"""Should return the target that this object attacks/decodes"""
pass
class Checker(Generic[T], ConfigurableModule):
@abstractmethod
def check(self, text: T) -> Optional[str]:
"""Should return some description (or an empty string) on success, otherwise return None"""
pass
@abstractmethod
def getExpectedRuntime(self, text: T) -> float:
pass
def __call__(self, *args):
return self.check(*args)
@abstractmethod
def __init__(self, config: Config):
super().__init__(config)
# class Detector(Generic[T], ConfigurableModule, KnownUtility, Targeted):
# @abstractmethod
# def scoreLikelihood(self, ctext: T) -> Dict[str, float]:
# """Should return a dictionary of (cipher_name: score)"""
# pass
#
# def __call__(self, *args): return self.scoreLikelihood(*args)
#
# @abstractmethod
# def __init__(self, config: Config): super().__init__(config)
class Decoder(Generic[T, U], ConfigurableModule, Targeted):
"""Represents the undoing of some encoding into a different (or the same) type"""
@abstractmethod
def decode(self, ctext: T) -> Optional[U]:
pass
@staticmethod
@abstractmethod
def priority() -> float:
"""What proportion of decodings are this?"""
pass
def __call__(self, *args):
return self.decode(*args)
@abstractmethod
def __init__(self, config: Config):
super().__init__(config)
class DecoderComparer:
value: Type[Decoder]
def __le__(self, other: "DecoderComparer"):
return self.value.priority() <= other.value.priority()
def __ge__(self, other: "DecoderComparer"):
return self.value.priority() >= other.value.priority()
def __lt__(self, other: "DecoderComparer"):
return self.value.priority() < other.value.priority() and self != other
def __gt__(self, other: "DecoderComparer"):
return self.value.priority() > other.value.priority() and self != other
def __init__(self, value: Type[Decoder]):
self.value = value
def __repr__(self):
return f"<DecoderComparer {self.value}:{self.value.priority()}>"
class CrackResult(NamedTuple, Generic[T]):
value: T
key_info: Optional[str] = None
misc_info: Optional[str] = None
class CrackInfo(NamedTuple):
success_likelihood: float
success_runtime: float
failure_runtime: float
class Cracker(Generic[T], ConfigurableModule, Targeted):
@abstractmethod
def getInfo(self, ctext: T) -> CrackInfo:
"""Should return some informed guesses on resource consumption when run on `ctext`"""
pass
@abstractmethod
def attemptCrack(self, ctext: T) -> List[CrackResult]:
"""
This should attempt to crack the cipher `target`, and return a list of candidate solutions
"""
# FIXME: Actually CrackResult[T], but python complains
pass
def __call__(self, *args):
return self.attemptCrack(*args)
@abstractmethod
def __init__(self, config: Config):
super().__init__(config)
class ResourceLoader(Generic[T], ConfigurableModule):
@abstractmethod
def whatResources(self) -> Optional[Set[str]]:
"""
Return a set of the names of instances T you can provide.
The names SHOULD be unique amongst ResourceLoaders of the same type
These names will be exposed as f"{self.__name__}::{name}", use split_resource_name to recover this
If you cannot reasonably determine what resources you provide, return None instead
"""
pass
@abstractmethod
def getResource(self, name: str) -> T:
"""
Returns the requested distribution
The behaviour is undefined if `name not in self.what_resources()`
"""
pass
def __call__(self, *args):
return self.getResource(*args)
def __getitem__(self, *args):
return self.getResource(*args)
@abstractmethod
def __init__(self, config: Config):
super().__init__(config)
class SearchLevel(NamedTuple):
name: str
result: CrackResult
class SearchResult(NamedTuple):
path: List[SearchLevel]
check_res: str
class Searcher(ConfigurableModule):
"""A very basic interface for code that plans out how to crack the ciphertext"""
@abstractmethod
def search(self, ctext: Any) -> SearchResult:
"""Returns the path to the correct ciphertext"""
pass
@abstractmethod
def __init__(self, config: Config):
super().__init__(config)
def pretty_search_results(res: SearchResult, display_intermediate: bool = False):
ret: str = f'Final result: "{res.path[-1].result.value}"\n'
if len(res.check_res) != 0:
ret += f"Checker: {res.check_res}\n"
ret += "Format used:\n"
def add_one():
nonlocal ret
ret += f" {i.name}"
already_broken = False
if i.result.key_info is not None:
ret += f":\n Key: {i.result.key_info}\n"
already_broken = True
if i.result.misc_info is not None:
if not already_broken:
ret += ":\n"
ret += f" Misc: {i.result.misc_info}\n"
already_broken = True
if display_intermediate:
if not already_broken:
ret += ":\n"
ret += f' Value: "{i.result.value}"\n'
already_broken = True
if not already_broken:
ret += "\n"
# Skip the 'input' and print in reverse order
for i in res.path[1:][::-1]:
add_one()
# Remove trailing newline
return ret[:-1]
# Some common collection types
Distribution = Dict[str, float]
WordList = Set[str]

143
ciphey/iface/_registry.py Normal file
View File

@ -0,0 +1,143 @@
from abc import ABC, abstractmethod
from collections import defaultdict
from typing import (
Any,
Callable,
Dict,
Generic,
Optional,
List,
NamedTuple,
TypeVar,
Type,
Union,
Set,
Tuple,
)
import pydoc
try:
from typing import get_origin, get_args
except ImportError:
from typing_inspect import get_origin, get_args
from loguru import logger
from . import _fwd
from ._modules import *
import datetime
class Registry:
# I was planning on using __init_subclass__, but that is incompatible with dynamic type creation when we have
# generic keys
RegElem = Union[List[Type], Dict[Type, "RegElem"]]
_reg: Dict[Type, RegElem] = {}
_names: Dict[str, Tuple[Type, Set[Type]]] = {}
_targets: Dict[str, Dict[Type, List[Type]]] = {}
_modules = {Checker, Cracker, Decoder, ResourceLoader, Searcher}
def _register_one(self, input_type, module_base, module_args):
target_reg = self._reg.setdefault(module_base, {})
# Seek to the given type
for subtype in module_args[0:-1]:
target_reg = target_reg.setdefault(subtype, {})
target_reg.setdefault(module_args[-1], []).append(input_type)
def _real_register(self, input_type: type, *args) -> Type:
name_target = self._names[input_type.__name__.lower()] = (input_type, set())
if issubclass(input_type, Targeted):
target = input_type.getTarget()
else:
target = None
if issubclass(input_type, Searcher):
module_type = module_base = Searcher
module_args = ()
else:
module_type: Optional[Type] = None
module_base = None
# Work out what module type this is
if len(args) == 0:
for i in input_type.__orig_bases__:
if module_type is not None:
raise TypeError(f"Type derived from multiple registrable base classes {i} and {module_type}")
module_base = get_origin(i)
if module_base not in self._modules:
continue
module_type = i
else:
for i in self._modules:
if not issubclass(input_type, i):
continue
if module_type is not None:
raise TypeError(f"Type derived from multiple registrable base classes {i} and {module_type}")
module_type = i
if module_type is None:
raise TypeError("No registrable base class")
# Now handle the difference between register and register_multi
if len(args) == 0:
if module_base is None:
raise TypeError("No type argument given")
self._register_one(input_type, module_base, get_args(module_type))
name_target[1].add(module_base)
else:
if module_base is not None:
raise TypeError(f"Redundant type argument for {module_type}")
module_base = module_type
for module_args in args:
# Correct missing brackets
if not isinstance(module_args, tuple):
module_args = (module_args,)
self._register_one(input_type, module_base, module_args)
name_target[1].add(module_type[module_args])
name_target[1].add(module_type)
if target is not None and issubclass(module_base, Targeted):
self._targets.setdefault(target, {}).setdefault(module_type, []).append(input_type)
return input_type
def register(self, input_type):
self._real_register(input_type)
def register_multi(self, *x):
return lambda input_type: self._real_register(input_type, *x)
def __getitem__(self, i: type) -> Optional[Any]:
target_type = get_origin(i)
# Check if this is a non-generic type, and return the whole dict if it is
if target_type is None:
return self._reg[i]
target_subtypes = get_args(i)
target_list = self._reg.setdefault(target_type, {})
for subtype in target_subtypes:
target_list = target_list.setdefault(subtype, {})
return target_list
def get_named(self, name: str, type_constraint: Type = None) -> Any:
ret = self._names[name.lower()]
if type_constraint and type_constraint not in ret[1]:
raise TypeError(f"Type mismatch: wanted {type_constraint}, got {ret[1]}")
return ret[0]
def get_targeted(
self, target: str, type_constraint: Type = None
) -> Optional[Union[Dict[Type, Set[Type]], Set[Type]]]:
x = self._targets.get(target)
if x is None or type_constraint is None:
return x
return x.get(type_constraint)
def __str__(self):
return f"ciphey.iface.Registry {{_reg: {self._reg}, _names: {self._names}, _targets: {self._targets}}}"
_fwd.registry = Registry()

View File

@ -91,44 +91,44 @@ class mathsHelper:
while counter_max < counter_prob:
max_overall = 0
highest_key = None
logger.debug(
logger.trace(
f"Running while loop in sort_prob_table, counterMax is {counter_max}"
)
for key, value in prob_table.items():
logger.debug(f"Sorting {key}")
logger.trace(f"Sorting {key}")
maxLocal = 0
# for each item in that table
for key2, value2 in value.items():
logger.debug(
logger.trace(
f"Running key2 {key2}, value2 {value2} for loop for {value.items()}"
)
maxLocal = maxLocal + value2
logger.debug(
logger.trace(
f"MaxLocal is {maxLocal} and maxOverall is {max_overall}"
)
if maxLocal > max_overall:
logger.debug(f"New max local found {maxLocal}")
logger.trace(f"New max local found {maxLocal}")
# because the dict doesnt reset
max_dict_pair = {}
max_overall = maxLocal
# so eventually, we get the maximum dict pairing?
max_dict_pair[key] = value
highest_key = key
logger.debug(f"Highest key is {highest_key}")
logger.trace(f"Highest key is {highest_key}")
# removes the highest key from the prob table
logger.debug(f"Prob table is {prob_table} and highest key is {highest_key}")
logger.debug(f"Removing {prob_table[highest_key]}")
logger.trace(f"Prob table is {prob_table} and highest key is {highest_key}")
logger.trace(f"Removing {prob_table[highest_key]}")
del prob_table[highest_key]
logger.debug(f"Prob table after deletion is {prob_table}")
logger.trace(f"Prob table after deletion is {prob_table}")
counter_max += 1
empty_dict = {**empty_dict, **max_dict_pair}
# returns the max dict (at the start) with the prob table
# this way, it should always work on most likely first.
logger.debug(
logger.trace(
f"The prob table is {prob_table} and the maxDictPair is {max_dict_pair}"
)
logger.debug(f"The new sorted prob table is {empty_dict}")
logger.trace(f"The new sorted prob table is {empty_dict}")
return empty_dict
@staticmethod
@ -145,11 +145,11 @@ class mathsHelper:
"""
# (f"d is {d}")
logger.debug(f"The old dictionary before new_sort() is {new_dict}")
logger.trace(f"The old dictionary before new_sort() is {new_dict}")
sorted_i = OrderedDict(
sorted(new_dict.items(), key=lambda x: x[1], reverse=True)
)
logger.debug(f"The dictionary after new_sort() is {sorted_i}")
logger.trace(f"The dictionary after new_sort() is {sorted_i}")
# sortedI = sort_dictionary(x)
return sorted_i
@ -185,7 +185,3 @@ class mathsHelper:
"""
text: str = str(text).translate(str.maketrans("", "", punctuation))
return text

View File

@ -1,71 +0,0 @@
# i need the below code to make tensorflow shut up
import os
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3"
import tensorflow as tf
from scipy.stats import chisquare
from tensorflow.keras.callbacks import TensorBoard
from tensorflow.keras.layers import (
Activation,
Conv2D,
Dense,
Dropout,
Flatten,
MaxPooling2D,
Reshape,
)
from tensorflow.keras.models import Sequential, load_model
from string import punctuation
import numpy
import sys
import cipheydists
sys.path.append("..")
try:
import ciphey.mathsHelper as mh
except ModuleNotFoundError:
import mathsHelper as mh
# i need the below code to make tensorflow shut up. Yup, it's SO bad you have to have 2 LINES TO MAKE IT SHUT UP!!!
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)
class NeuralNetwork:
"""
Class to use the neural network
"""
def __init__(self):
self.CATEGORIES = ["sha1", "md5", "sha256", "sha512", "caeser", "plaintext"]
self.CATEGORIES = [1, 2, 3, 4, 5, 6]
# self.MODEL = load_model("cipher_detector.h5")
self.MODEL = load_model(cipheydists.get_model("cipher_detector"))
self.mh = mh.mathsHelper()
def formatData(self, text):
"""
formats the data
"""
result = []
result.append(len(text))
result.append(len(list(set(list(text)))))
return result
def editData(self, data):
"""
Data has to be in format:
* [length of text, how many unique letters it has, the normalised chi square score]
"""
new = []
new.append(self.formatData(data))
return numpy.asarray(new)
def predictnn(self, text):
"""
use this to create predictions for the NN
returns softmax (probability distribution)
"""
text = self.editData(text)
return self.MODEL.predict(text)

0
cipheytreesearch.md Normal file
View File

View File

@ -13,6 +13,46 @@ These are taken from the GitHub Issues tab.
* The code is now pep8'd
* Move to Poetry from setuptools.py
* Moved to Pytest from unittest
## 5.0.0 _The Great Refactor_
_The Great Refactor_ is version 5 of Ciphey. The entire program was refactored.
#### Features
* Added base58 Bitcoin
* Added base58 Ripple
* Added Base62 (link shortener char set)
* Added base85
* Added base85 asciii
* **A brand new cipher detection interface**
* **Much faster, much more accuracte `brandon` interface, which is the default plaintext checking interface**
* Recursive decryption methods. Is your text base64 -> binary -> caesar -> vigenere? Ciphey can handle it now. I was told I'm not supposed to talk about nerdy things in the changelog but.... We're using A* search with the weight being how many computations it takes and the heuristic being the likelihood chance. Pretty nifty!
* Now on Winget (Windows Package Manager)
* Brandon interface now has a stopwords checker, with 97% accuracy and high speeds (0.0000006 seconds on average).
* Brandon checker's dictionary checker has 99% accuracy on average across all sentence lengths and compeltes in 0.002 seconds.
* Added a regex checker, so the user can enter a regex like `THM{*}` and the checker will find it.
* Added a neural network that can detect English.
* Created `settings.yml`, a settings file which lets the user change how Ciphey works internally.
* Added flag `--where`, which tells you where Ciphey expects the `settings.yml` file to be.
* Added `regexFile` to `settings.yml`, which is where the user can sotre all the regexes they want the regex checker to check against.
* Now on Homebrew for Mac OS
* Now on the Arch User Respository
#### Bug Fixes
* Morse code is now better optimised and works across multiple different Morse alphabets.
* Fixed issue where Vigenere broke on inputs of equal signs.
* Fixed issue where dictionary.txt was too small.
* Updated stopwords & 1k words dictionaries.
#### Maintenance
* Tensorflow is reduced from 500mb to a 1mb install using TF lite.
* Models are now parsed in C++
* More documentation written
* Changed Contributing file
* Created speed_test.py, which is used to help add new lanuages to Brandon checker.
* Added the JSON selection system for CipheyDists.
* The Ciphey main dictionary now supports UK, USA, AU, CAN dialects of English.
* Many, many more tests were added to the program.
* Targetting system added to main(), now Ciphey can internally target any cipher instead of previously where the cipher couldn't be manually chosen.
* The settings file is automatically searched for in APPDIRS.
* Moved the docs to its own dedicated GitHub Repo
* Used Terminalizer to record pretty gifs
* Redesigned the README
## 4.1
#### Features
* Vigenere is now enabled, due to massive performance gains from the C++ core

View File

@ -23,6 +23,8 @@ Encodings
* Hexadecimal
* Binary
* Morse Code
* Morse code with new lines
* Octal Decoding (Base8)
Hashes
-------
@ -34,4 +36,4 @@ Hashes
What Ciphers are going to be implemented next?
-----------------------------------------------
`See this GitHub issue <https://github.com/Ciphey/Ciphey/issues/63>`
`See this GitHub issue <https://github.com/Ciphey/Ciphey/issues/63>`_

View File

@ -19,11 +19,34 @@ Structure
.. code:: python
{
"ctext": "str: The ciphertext that is being attacked",
"grep": "bool: The greppable flag",
"info": "bool: The info flag",
"debug": "str: The loguru debug level",
"checker": "LanguageChecker: an instance of the selected language checker",
"wordlist": "AbstractSet[Str]: The selected wordlist",
"params": "Dict[Str, str]: The given module parameters"
"debug": "str: The loguru debug level, one of ['TRACE', 'DEBUG', 'WARNING', 'ERROR', None/~]",
"checker": "str: The name of the language checker class to be used",
"params": "Dict[str, Dict[str, Union[List[str], str]]]: The given module parameters, indexed by the module name and the param name",
"modules": "List[str]: Paths to modules that should be loaded",
"utility_threshold": "float: A value between 0 and 2 representing what Detectors should be used in the first pass",
"format": "Dict[str, str]: formed of 'in' and 'out', which map to the name of their respective types"
}
These are the defaults, represented as a YAML config file.
An omission of any field will result in these values being used
.. code:: yaml
grep: false
info: false
debug: WARNING
checker: brandon
format:
in: str
out: str
utility_threshold: 1.5
score_threshold: 0.8
The following internal modules are loaded by default, even if not specified:
* The ``brandon`` LanguageChecker
* The ``cipheydists`` collection of Distributions, CharSets and WordLists
* The ``json`` Distribution, CharSet and WordLists, that load in a json file
* The ``csv`` Distribution, CharSet and WordLists, that load in a csv file

47
docs/features.rst Normal file
View File

@ -0,0 +1,47 @@
Features
==========
* 20+ encryptions supported :ciphers:`click here for the full list`.
* Advance cipher targetting system, making use of artifical intelligence and common sense.
Are you sick of artifical intelligence bloating everything? We only use it when it is **absolutely necessary**. The common sense part is important. If we see a string like "010100010100000111" we assume it is binary.
* Custom built natural language processing module(s).
Language Checker checks to see if the given text is plaintext. We do this either with the Brandon checker (the default checker), the deep neural network, or regex.
* Regex checker
If you have text you know is plaintext, such as _HTB{e563d8ae4b557d21060bfeb2a06d5cb2}_ but clearly won't be picked up as a language, use the regex checker.
* Multi language support
Also to note, Ciphey's default checker, Brandon, has multi language support and currently supports English & German.
* C++ Core
Ciphey has a C++ core for cryptanalysis tidbits. Python is very slow, but C++ is very fast. By offloading the bruteforcing of the program, we saw speed increases such as Caesar Cipher's 30% speed increase.
* Supports Hashes & Encryptions
Other online tools may only support encodings, hashes, or encryptions. Ciphey supports all of them!
* Tweakable
Ciphey has a settings.yml file. This file lets you tweak the internal procedures of Ciphey. Want to use the German dictionary for phase 1 of language checker, and then the English dictionary? No worries! You can do that.
Do you have a bunch of regexes, but hate inputting them manually? Store them in the settings file.
* Extensively tested with a lot of documentation
Everytime Ciphey goes for a release, it gets tested by many hand-written unit tests. And then, an automated testing system tests Ciphey 20,000 times over to make sure nothing breaks.
* Not opionated
Base64 has an alternative syntax, but many online decoders don't make use of the alternative syntax. Opting to give you the most popular one. Thus, they are optionated.
Ciphey strays from this as much as possible. We try not to hold an opinion on anything we don't need to. Alternative syntax is available for many modules, and is automatically tested against. No more worrying if Ciphey
* Easy to contribute to
Want to add a new language? We have an easy to follow guide on this documentation.
Want to add more decryption methods? Again, easy to follow guide.
Ciphey is designed to be as modular as possible, so anyone wishing to contribute simply has to push their module and Ciphey will work with it.
* Built by the CTF community, for the CTF community
Ciphey was originally built for the Geocaching community, but is now built mainly for the CTF community. Although, it can be used by anyone.
Cyclic3 & Brandon (core maintainers) are commitee members of the Liverpool Cyber Security Society. Both regularly attend CTFs and win some too.
Brandon was #2 on the TryHackMe leaderboards.
All code contributors or maintainers have been in CTFs, and we are all very active in the CTF community.
Ciphey is built by the CTF community, for the CTF community.

6
docs/howWork.rst Normal file
View File

@ -0,0 +1,6 @@
How does Ciphey work? An in-depth guide
========================================
First, when Ciphey is ran, it parses the arguments using `argpars` library and/or manual parsing of the arguments, depending on what is given to Ciphey.
Then, Ciphey sends the inputted text and arguments to the cipher detection interface.

58
docs/lc2.rst Normal file
View File

@ -0,0 +1,58 @@
Brandon Interface
==================
The Brandon interface is the default language checking interface for Ciphey. So named because it is an algorithm created by Brandon, and we couldn't come up with any clever names for it at the time.
Contributing your own language
------------------------------
1. Get a dictionary of your language
2. Get stop words of your language
3. Get the top 1000 words of your language
4. Get the alphabet of your language
5. Get the frequency distribution of your language. We suggest taking a very popular large text (for English we used Charles Dickens' complete works) and calculating the frequency distribution yourself.
6. Add these to CipheyDists with the appropriate names and in appropriate folders.
7. Calculate the thresholds / sentence lengths using the program detailed in the secion in this document.
8. Pull requestr and you're done!
How were the thresholds / sentence lengths chosen?
--------------------------------------------------
Brandon (the person) created a program to automatically test which checkers, sentence lengths, and thresholds were best for the newest version of Brandon checker.
The most important thing about the tests was "which is the best metric we can use as a phase 1 checker?" The tests consisted of:
* Lemminization
* Stop words
* Check 1000 words
* Word Endings
* Word endings with 3 chars
Each one was tested 20,000 times for accuracy & speed. Only stop words & check 1000 words survived this testing, both being high accuracy and incredibly fast.
Stopwords is a lot faster than Check 1000 words, but on much smaller texts it has terrible accuracy. Naturally longer plaintexts have higher amounts of stop words.
Naturally, Brandon questioned whether it was worth it to check the length of the text, and change the checker to increase the accuracy whilest maintaining high speed.
Preliminary tests showed that this was true. Stopwords had an accuracy of 85% on shorter texts, whereas check1000 words had an accuracy of 97%. On much higher texts, stopwords had an equal accuracy but is much faster.
A sentence is defined as "a single sentence from the corpus of Hansard.txt". The sentence lengths tested were 1, 2, 3, 4, 5 and 20.
After Brandon had found the best checkers for the certain sentence lengths, he calculated the mean average len() of each sentence. This is as follows:
1 : The mean is 87.62
2 : The mean is 110.47925
3 : The mean is 132.20016666666666
4 : The mean is 154.817125
5 : The mean is 178.7297
20: The mean is 714.9188
Next, the question of percentage thresholds.
Brandon realised that hard coding in thresholds (such as 55%) was a stupid idea. Surely there exists ideal thresholds that optimise the accuracy of the checker. And surely these thresholds change over the sentence length (stopwords would need a higher threshold for smaller texts but as the text size inceases it can use a lower threshold).
This means that the threshold & checker changes depending on the text size.
What languags are supported?
----------------------------
* English
* German

13
docs/privacy.rst Normal file
View File

@ -0,0 +1,13 @@
How do I know you're not taking the plaintext and storing it?
==========================================================
Valid concerns, but here are multiple ways to make sure we aren't taking your plaintext and storing it somewhere.
1. Read the source code on GitHub
2. Read the source code of the file you downloaded
3. Use Burp Suite to look at what is being sent to us
4. Use Wireshark to do the same
5. Understand we are 2 university students and this is being ran off of a GitHub Education Pack plan. We literally do not have the resources to do anything bad.
6. If we did store the plaintext, and our university found out, we will lose our degrees and that's scary.
7. Checksums
8. We have no interest in the plaintext it's not useful to us at all.
9. If you are still paranoid you can copy and paste the file, and open a GitHub issue with it.
10. We're mentors / president / on commitee of many cyber security organisations. If we did something this stupid, not only would we lose our degree but all credability.

60
docs/settings.rst Normal file
View File

@ -0,0 +1,60 @@
The Settings File
=================
The settings file contains settings for Ciphey. Specifically, some of these you may want:
* REGEX list. Have a list of REGEX's for the REGEX checker? Use the settings file.
* Default language. Hate how Ciphey always loads in English? Use the settings file to change the default language to whatever you want.
* Is the language checker not working how you want it to work? Fine-tune the details in the settings file.
Default settings file
---------------------
Save this as settings.yml in the appdirs location, which can be found by running ciphey -where or --where.
.. code-block:: shell
➜ python3 ciphey -where
settings.yml should be placed in /home/bee/.config/ciphey
From this example, we can see that using the argument we need to place the settings file at /home/bee.config/ciphey/settings.yml
The settings file follows a specific format. **Copy and paste this below!**
.. code-block:: yaml
---
language_checker_options:
# The language checking options. Basically, this detects plaintext.
default_language: "english" # What language do you want to use?
default_checker: "brandon"
english:
dict_name: english # the name of the dict in cipheyDists
stopwords_name: english # The name of the stopwords set in cipheyDists
brandon: # The brandon checker, the default checker
thresholds:
# Sentence length: {Checker: percentage threshold}
# Want to know how these numbers were selected? Read the docs here TODO
"Phase 1": {0: {"check": 0.02}, 110: {"stop": 0.15}, 150: {"stop": 0.28}}
"Phase 2": {0: 0.55} # phase 2 threshold
german:
brandon:
dict_name: german
stopwords_name: german
thresholds:
0.55
regexFile:
# Put your custom REGEX here
# These 4 REGEX's cover the most popular CTF flag formats.
# {.*} means "any text of any size here" and /i means "ignore case".
# For example, for the CTf NoobCTF the format would be /NoobCTF{.*}/i
- /HTB{.*}/i # TODO HTB strings are just md5s
- /THM{.*}/i
- /FLAG{*.}/i
- /CTF{*.}/i
Some of the notable options you may want to change:
* Default language
* Default checker
And to add more regex, simply list them under the others.

6
entry_point.py Normal file
View File

@ -0,0 +1,6 @@
# Entry point used for PyInstaller
from ciphey.__main__ import main
if __name__ == "__main__":
main()

33
entry_point.spec Normal file
View File

@ -0,0 +1,33 @@
# -*- mode: python ; coding: utf-8 -*-
block_cipher = None
a = Analysis(['entry_point.py'],
pathex=['/home/bee/Documents/Ciphey'],
binaries=[],
datas=[],
hiddenimports=[],
hookspath=[],
runtime_hooks=[],
excludes=[],
win_no_prefer_redirects=False,
win_private_assemblies=False,
cipher=block_cipher,
noarchive=False)
pyz = PYZ(a.pure, a.zipped_data,
cipher=block_cipher)
exe = EXE(pyz,
a.scripts,
a.binaries,
a.zipfiles,
a.datas,
[],
name='entry_point',
debug=False,
bootloader_ignore_signals=False,
strip=False,
upx=True,
upx_exclude=[],
runtime_tmpdir=None,
console=True )

View File

@ -1,6 +1,6 @@
[tool.poetry]
name = "ciphey"
version = "4.2.1"
version = "5.0.0rc1"
description = "Automated Decryption Tool"
authors = ["Brandon <brandon@skerritt.blog>"]
license = "MIT"
@ -8,20 +8,21 @@ documentation = "https://docs.ciphey.online"
exclude = ["tests/hansard.txt"]
[tool.poetry.dependencies]
python = "^3.6"
python = "^3.7"
tensorflow = "^2.1.0"
rich = "^1.2.3"
loguru = "^0.5.0"
pylint = "^2.5.2"
flake8 = "^3.8.2"
cipheydists = "^0.2.2"
cipheycore = "^0.1.5"
tflite = "^2.2.0"
cipheydists = "^0.3.5"
cipheycore = "^0.2.2"
appdirs = "^1.4.4"
typing_inspect = { version = "^0.6.0", python = "~3.6 || ~3.7" }
base58 = "^2.0.1"
pybase62 = "^0.4.3"
click = "^7.1.2"
click-option-group = "^0.5.1"
click-completion = "^0.5.2"
click-spinner = "^0.1.10"
pyyaml = "^5.3.1"
[tool.poetry.dev-dependencies]
pytest-cov = "^2.9.0"

View File

@ -1,26 +1,34 @@
---
language_checker_options:
# The language checking options. Basically, this detects plaintext.
default_language: cipheydists::english # What language do you want to use?
default_checker: brandon
english:
dict_name: cipheydists::english # the name of the dict in cipheyDists
# Change these to choose your default arguments.
arguments:
greppable: False # is the output grepped?
cipher: False # Do you want extra information on the cipher used?
a: 'brandon' # the default language detection interface
A: None # The script / module at <path> containing _ciphey_accecptor_ variable
p: "value" # Sets the kwarg <param> to <value> when the is_acceptable method is called
wordlist: None # Default wordlist file. Set None to a path such as "/usr/share/wordlists/rockyou.txt"
t: None # The text you want to decrypt
j: Fale # Do you want to run potential candidates against wordlist? Useful for when you think the output is a password. Must set wordlist to use.
s: False # skip the neural network and only use simple filtration system
printWorkingDirectory: False # tells you where this settings file is
stopwords_name: cipheydists::englishStopWords # The name of the stopwords set in cipheyDists
top1000: cipheydists::english1000
brandon: # The brandon checker, the default checker
thresholds:
# Sentence length: {Checker: percentage threshold}
# Want to know how these numbers were selected? Read the docs here TODO
"Phase 1": {0: {"check": 0.02}, 110: {"stop": 0.15}, 150: {"stop": 0.28}}
"Phase 2": {0: {"dict": 0.92}, 75: {"dict": 0.80}, 110: {"dict": 0.65}, 150: {"dict": 0.55}, 190: {"dict": 0.38}} # phase 2 threshold
regexFile: # set to path to Regex file to run custom regex
german:
brandon:
dict_name: german
stopwords_name: german
thresholds:
0.55
regexFile:
# Put your custom REGEX here
# These 3 REGEX's cover the most popular CTF flag formats.
- /HTB{.*}/i
# These 4 REGEX's cover the most popular CTF flag formats.
# {.*} means "any text of any size here" and /i means "ignore case".
# For example, for the CTf NoobCTF the format would be /NoobCTF{.*}/i
- /HTB{.*}/i # TODO HTB strings are just md5s
- /THM{.*}/i
- /FLAG{*.}/i
- /CTF{*.}/i

354
tests/brandon_interface.md Normal file
View File

@ -0,0 +1,354 @@
If I'm reading this correctly:
> I would suggest a simple lower bound test: we pass if we get more than 25%,and fail if we get lower than 5% (or smth idk) for n consecutive windows.
You're suggesting that we run all tests and see if we get 25% Imo that would be much slower. What do you mean by `n windows`?
Okay, Chi squared is out then!
> Perhaps we can return an object from the cracker which states what tests have been performed, to save time on redundant analysis. With such information, brandon could make an intelligent decision to just use a wordlist if enough analysis was performed, and the more detailed analysis if it wasn't.
This is entirely possible. I will add support to `brandon` checker to skip phase 1 if it receives an dictionary with key `"phase1": True` for `True == skip phase 1`.
If you have more tests, let me know and I can factor them in.
In your first reply:
https://github.com/Ciphey/Ciphey/issues/90#issuecomment-645046918
Point 3:
> Be aware that the stuff passed to the checker will most likely be complete gibberish (with a similar freq dist) OR the correct result. A user will not care about an extra second spent on the final correct result, but really will care that every false candidate takes an extra second. The current suggestion seems to be pessimal for the gibberish inputs: maybe add some sanity checks (have I failed to match any word, have I failed to lemmatise any word, etc.)
I decided to test how well `lem` worked as phase 1. To do this, I created this program:
```python
"""
TL;DR
Tested over 20,000 times
Maximum sentence size is 15 sentences
1/2 chance of getting 'gibberish' (encrypted text)
1/2 chance of getting English text
Each test is timed using Time module.
The accuracy is calculated as to how many true positives we get over the entire run
"""
import spacy
import random
import time
from statistics import mean
import ciphey
import enciphey
from alive_progress import alive_bar
nlp = spacy.load("en_core_web_sm")
f = open("hansard.txt", encoding="ISO-8859-1").read()
f = f.split(".")
enciph = enciphey.encipher()
def lem(text):
sentences = nlp(text)
return set([word.lemma_ for word in sentences])
def get_random_sentence():
if random.randint(0, 1) == 0:
x = None
while x is None:
x = (True, " ".join(random.sample(f, k=random.randint(1, 50))))
return x
else:
x = None
while x is None:
x = enciph.getRandomEncryptedSentence()
x = x["Encrypted Texts"]["EncryptedText"]
return (False, x)
# Now to time it and take measurements
def perform():
# calculate accuracy
total = 0
true_returns = 0
# calculate aveager time
time_list = []
# average sentance size
sent_size_list = []
items = range(20000)
with alive_bar(len(items)) as bar:
for i in range(0, 20000):
sent = get_random_sentence()
text = sent[1]
truthy = sent[0]
sent_size_list.append(len(text))
# should be length of chars
old = len(text)
# timing the function
tic = time.perf_counter()
new = lem(text)
tok = time.perf_counter()
# checking for accuracy
new = len(new)
# the and here means we only count True Positives
if new < old and truthy:
true_returns += 1
total += 1
# appending the time
t = tok - tic
time_list.append(t)
bar()
print(
f"The accuracy is {str((true_returns / total) * 100)} \n and the time it took is {str(round(mean(time_list), 2))}. \n The average string size was {str(mean(sent_size_list))}"
)
perform()
```
The results were fascinating, to say the least.
With a 50/50 chance of the text being gibberish (ciphertext from enCiphey) or sentences from Hansard.txt, we had these results for using lemmization as phase 1:
```
The accuracy is 49.63%
and the time it took is 0.02 seconds on average.
The average string size was 1133.63255.
```
**We get a 50% accuracy with a speed of 0.02 seconds on average, across 20k tests with the average size of a string being 1133 chars. **
The accuracy is quite bad considering that a coin flip is 50/50.
On average, the user would expect Phase 2 to be entered 50% of the time, which is annoying as phase 2 is quite slow. But by itself it's quite fast.
I am going to build the "2nd phase" of phase 1 using the While Loop we saw earlier. If we can combine just one more metric, we would see much higher accuracy and again - likely incredibly low latency.
I will create a table of my results:
## Table of max sentence length == 50
| Name | Speed | Accuracy | String Size Average Chars | Epochs | Max Sentence Size |
| -------------------------- | ---------------------------- | -------- | ------------------------- | ------ | ----------------- |
| Lemmization (lem) | 0.02 seconds | 50% | 1580 | 20,000 | 50 |
| Stop word removal | 3.05465052884756e-05 seconds | 96% | 1596 | 20,000 | 50 |
| Check1000Words | 0.0005 seconds | 96% | 1597 | 20,000 | 50 |
| Word endings | 0.0009 seconds | 95% | 1597 | 20,000 | 50 |
## Table of max sentence length == 5
| Name | Speed | Accuracy | String Size Average Chars | Epochs | Max Sentence Size |
| -------------------------- | ------------------------------ | -------- | ------------------------- | ------ | ----------------- |
| Lemmization (lem) |
| Stop word removal | 1.1574924453998391e-05 seconds | 93% | 569 | 20,000 | 5 |
| Check1000Words | 0.0006 seconds | 95% | 586 | 20,000 | 5 |
| Word endings | 0.0003 seconds | 92% | 482 | 20,000 | 5 |
## Table of max sentence length == 1
| Name | Speed | Accuracy | Threshold | String Size Average Chars | Epochs | Max Sentence Size |
| -------------------------- | ------------------------------- | -------- | ------ |------------------------- | ------ | ----------------- |
| Lemmization (lem) |
| Stop word removal | 1.2532061150591289e-05. seconds | 50% | 481 | 20,000 | 1 |
| Check1000Words | 0.0006 seconds | 95% | 586 | 20,000 | 5 |
| Word endings | 0.0002 seconds | 86% | 15| 482 | 20,000 | 1 |
## Confusion Matrices & Notes
### Lemization
```
Positive Negative
Positive 10031 9967
Negative 2 0
```
### Stop Words
This test was performed where the text was not `.lower()`, so the actual accuracy _may_ be a little tiny bit higher since the stop words list is all lowercase.
50 sentence limit
```
Positive Negative
Positive 9913 855
Negative 56 9176
```
5 sentence limit:
```
Positive Negative
Positive 9513 967
Negative 530 8990
```
### Check 1000 words
50 sentence limit
```
Positive Negative
Positive 10008 552
Negative 56 9384
```
5 sentence limit
```
Positive Negative
Positive 9563 597
Negative 397 9443
```
# Analysis
**I believe that the best Brandon checker will look at the length of the text, and adjust the % threshold and the exact phase 1 checker per text.**
The below data is taken from calculations performed over many hours. it shows the best threshold % for the best phase 1 checker with the highest accuracy. These checkers were chosen as others showed a maximum accuracy of 58%.
```
{'check 1000 words': {1: {'Accuracy': 0.925, 'Threshold': 2},
2: {'Accuracy': 0.95, 'Threshold': 68},
3: {'Accuracy': 0.975, 'Threshold': 62},
4: {'Accuracy': 0.98, 'Threshold': 5},
5: {'Accuracy': 0.985, 'Threshold': 54}},
'stop words': {1: {'Accuracy': 0.865, 'Threshold': 50},
2: {'Accuracy': 0.93, 'Threshold': 19},
3: {'Accuracy': 0.965, 'Threshold': 15},
4: {'Accuracy': 0.97, 'Threshold': 28},
5: {'Accuracy': 0.985, 'Threshold': 29}}
```
Where the numbers are:
```
1 : The mean is 87.62
2 : The mean is 110.47925
3 : The mean is 132.20016666666666
4 : The mean is 154.817125
5 : The mean is 178.7297
```
Looking at this test, it is clear that stopwords is better than check 1000 words for speed, but the accuracy is a little bit slower. Stop words is incredibly faster than check 1k words, but on a smaller input the stopwords checker breaks.
Therefore, we should use stopword checker on larger texts, and check 1k words on smaller texts.
More specifically, stopwords checker for len == 110 has an optimal threshold of 19, whereas check 1k words has an optimal threshold of 68. This means that while stopwords can potentially end earlier and only search the first 19% of the list, check 1k words would search 68% of the list.
Stopwords has a lower accuracy by 2%, but it is much, much faster and its optimal threshold is greatly reduced.
So ideally, we would have this algorithm:
1. Sentence length less than 110:
1. Use check 1k words with threshold of 2%
2. Sentence length > 110:
1. use Stopwords with threshold of 15
3. Sentence length > 150:
1. Stopwords threshold increases to 28
This is the ideal optimal phase 1 algorithm for `brandon` checker.
# Phase 2
Phase 2 is the dictionary checker.
Firstly, we check to find the best thresholds for the dictionary checker.
```
'checker': {1: {'Accuracy': 0.97, 'Threshold': 99},
2: {'Accuracy': 0.98, 'Threshold': 98},
3: {'Accuracy': 0.965, 'Threshold': 68},
4: {'Accuracy': 0.99, 'Threshold': 93},
5: {'Accuracy': 0.97, 'Threshold': 92}},
```
The accuracies are good, but the thresholds are simply too high. We're overfitting!
To fix this, I thought that because the dictionary contained chars <= 2 such as "a" or "an" it was setting off the completion too much, resulting in a much higher threshold.
To fix this, I only let the checker consider words that are more then 2 chars.
This is the result:
```
'checker': {1: {'Accuracy': 0.965, 'Threshold': 60},
2: {'Accuracy': 0.98, 'Threshold': 77},
3: {'Accuracy': 0.985, 'Threshold': 67},
4: {'Accuracy': 0.985, 'Threshold': 99},
5: {'Accuracy': 0.98, 'Threshold': 47}},
```
The accuracy stayed around the same, but the threshold went down. Although the threshold was still kind of high. 99% threshold for 4? I restricted the threshold to 75% and:
```
'checker': {1: {'Accuracy': 0.945, 'Threshold': 66},
2: {'accuracy': 0.975, 'threshold': 69},
3: {'accuracy': 0.98, 'threshold': 71},
4: {'accuracy': 0.99, 'threshold': 65},
5: {'accuracy': 0.98, 'threshold': 38}},
```
We can see that the accuracy stayed roughly the same, but the threshold went down a lot. The mean appears to be 66% (from just looking at it).
However, the accuracy for smaller sentence sizes tanked.
The highest accuracy we had was with the original one. Words <= 2 chars and no limit on threshold.
If possible, we want to combine the high accuracy on smaller texts while maintaining the generalisation found in the latter checker results.
The reason we want a smaller threshold is that due to the chunking procedure, it will be much faster on larger texts. The lower the sentence length the higher the threshold is allowed to be.
For phase 2, we are not concerned with speed. We are however concerned with accuracy.
I believe that threshold > 90% is overfitting. I cannot reasonably see this successfully working within Ciphey itself.
My next test will be max threshold of 100% with no chars less than or equal to 1.
```
'checker': {1: {'Accuracy': 0.97, 'Threshold': 93},
2: {'Accuracy': 0.975, 'Threshold': 82},
3: {'Accuracy': 0.97, 'Threshold': 96},
4: {'Accuracy': 0.965, 'Threshold': 31},
5: {'Accuracy': 0.965, 'Threshold': 74}},
```
the accuracy is 97% with a threshold of 93. This is much higher than the latter test. I think for lower texts, since we don't care about speed, we should use a higher threshold. This test was ran 20,000 times. I will run the tests once much to see if the threshold significantly changes.
The test results were:
```
'checker': {1: {'Accuracy': 0.96, 'Threshold': 92},
2: {'Accuracy': 0.97, 'Threshold': 95},
3: {'Accuracy': 0.965, 'Threshold': 81},
4: {'Accuracy': 0.96, 'Threshold': 38},
5: {'Accuracy': 0.975, 'Threshold': 52}},
```
One last test. No threshold limit with no char limit.
```
'checker': {1: {'Accuracy': 0.98, 'Threshold': 92},
2: {'Accuracy': 0.99, 'Threshold': 91},
3: {'Accuracy': 0.97, 'Threshold': 83},
4: {'Accuracy': 0.97, 'Threshold': 71},
5: {'Accuracy': 0.975, 'Threshold': 74}},
```
In total, we want these ones:
```
{1: {'Accuracy': 0.98, 'Threshold': 92},
2: {'accuracy': 0.975, 'threshold': 69},
3: {'accuracy': 0.98, 'threshold': 71},
4: {'accuracy': 0.99, 'threshold': 65},
5: {'accuracy': 0.98, 'threshold': 38}},
^^ with 75% threshold limit
```
Lower thresholds, accuracies look good too.

View File

@ -1,4 +1,4 @@
from ciphey.LanguageChecker.brandon import Brandon
from ciphey.basemods.Checkers.brandon import Brandon
config = dict()
lc = config["checker"](config)
import unittest

View File

@ -20,7 +20,7 @@ class encipher:
def __init__(self): # pragma: no cover
"""Inits the encipher object """
self.text = self.read_text()
self.MAX_SENTENCE_LENGTH = 20
self.MAX_SENTENCE_LENGTH = 5
# ntlk.download("punkt")
self.crypto = encipher_crypto()
@ -30,13 +30,13 @@ class encipher:
splits = nltk.tokenize.sent_tokenize(x)
return splits
def getRandomSentence(self): # pragma: no cover
def getRandomSentence(self, size): # pragma: no cover
return TreebankWordDetokenizer().detokenize(
random.sample(self.text, random.randint(1, self.MAX_SENTENCE_LENGTH))
random.sample(self.text, random.randint(1, size))
)
def getRandomEncryptedSentence(self): # pragma: no cover
sents = self.getRandomSentence()
def getRandomEncryptedSentence(self, size): # pragma: no cover
sents = self.getRandomSentence(size)
sentsEncrypted = self.crypto.randomEncrypt(sents)
return {"PlainText Sentences": sents, "Encrypted Texts": sentsEncrypted}

View File

@ -11,113 +11,113 @@ class testIntegration(unittest.TestCase):
"""
def test_basics(self):
lc = LanguageChecker.LanguageChecker()
result = lc.checkLanguage(
lc = LanguageChecker.Checker()
result = lc.check(
"Hello my name is new and this is an example of some english text"
)
self.assertEqual(result, True)
def test_basics_german(self):
lc = LanguageChecker.LanguageChecker()
result = lc.checkLanguage("hallo keine lieben leute nach")
lc = LanguageChecker.Checker()
result = lc.check("hallo keine lieben leute nach")
self.assertEqual(result, False)
def test_basics_quickbrownfox(self):
"""
This returns true becaue by default chi squared returns true so long as it's less than 10 items it's processed
"""
lc = LanguageChecker.LanguageChecker()
result = lc.checkLanguage("The quick brown fox jumped over the lazy dog")
lc = LanguageChecker.Checker()
result = lc.check("The quick brown fox jumped over the lazy dog")
self.assertEqual(result, True)
def test_basics_quickbrownfox(self):
"""
This returns true becaue by default chi squared returns true so long as it's less than 10 items it's processed
"""
lc = LanguageChecker.LanguageChecker()
result = lc.checkLanguage("The quick brown fox jumped over the lazy dog")
lc = LanguageChecker.Checker()
result = lc.check("The quick brown fox jumped over the lazy dog")
self.assertEqual(result, True)
def test_chi_maxima_true(self):
"""
This returns false because s.d is not over 1 as all inputs are English
"""
lc = LanguageChecker.LanguageChecker()
result = lc.checkLanguage("sa dew fea dxza dcsa da fsa d")
result = lc.checkLanguage("df grtsf a sgrds fgserwqd")
result = lc.checkLanguage("fd sa fe safsda srmad sadsa d")
result = lc.checkLanguage(" oihn giuhh7hguygiuhuyguyuyg ig iug iugiugiug")
result = lc.checkLanguage(
lc = LanguageChecker.Checker()
result = lc.check("sa dew fea dxza dcsa da fsa d")
result = lc.check("df grtsf a sgrds fgserwqd")
result = lc.check("fd sa fe safsda srmad sadsa d")
result = lc.check(" oihn giuhh7hguygiuhuyguyuyg ig iug iugiugiug")
result = lc.check(
"oiuhiuhiuhoiuh7 a opokp[poj uyg ytdra4efriug oih kjnbjhb jgv"
)
result = lc.checkLanguage("r jabbi tb y jyg ygiuygytff u0")
result = lc.checkLanguage("ld oiu oj uh t t er s d gf hg g h h")
result = lc.checkLanguage(
result = lc.check("r jabbi tb y jyg ygiuygytff u0")
result = lc.check("ld oiu oj uh t t er s d gf hg g h h")
result = lc.check(
"posa idijdsa ije i vi ijerijofdj ouhsaf oiuhas oihd "
)
result = lc.checkLanguage(
result = lc.check(
"Likwew e wqrew rwr safdsa dawe r3d hg jyrt dwqefp ;g;;' [ [sadqa ]]."
)
result = lc.checkLanguage("Her hyt e jytgv urjfdghbsfd c ")
result = lc.checkLanguage("CASSAE X T H WAEASD AFDG TERFADDSFD")
result = lc.checkLanguage("das te y we fdsbfsd fe a ")
result = lc.checkLanguage("d pa pdpsa ofoiaoew ifdisa ikrkasd s")
result = lc.checkLanguage(
result = lc.check("Her hyt e jytgv urjfdghbsfd c ")
result = lc.check("CASSAE X T H WAEASD AFDG TERFADDSFD")
result = lc.check("das te y we fdsbfsd fe a ")
result = lc.check("d pa pdpsa ofoiaoew ifdisa ikrkasd s")
result = lc.check(
"My friend is a really nice people who really enjoys swimming, dancing, kicking, English."
)
self.assertEqual(result, True)
def test_integration_unusual_one(self):
lc = LanguageChecker.LanguageChecker()
result = lc.checkLanguage("HELLO MY NAME IS BRANDON AND I LIKE DOLLAR")
lc = LanguageChecker.Checker()
result = lc.check("HELLO MY NAME IS BRANDON AND I LIKE DOLLAR")
self.assertEqual(result, True)
def test_integration_unusual_two(self):
lc = LanguageChecker.LanguageChecker()
result = lc.checkLanguage("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa")
lc = LanguageChecker.Checker()
result = lc.check("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa")
self.assertEqual(result, False)
def test_integration_unusual_three(self):
lc = LanguageChecker.LanguageChecker()
result = lc.checkLanguage("password")
lc = LanguageChecker.Checker()
result = lc.check("password")
self.assertEqual(result, True)
def test_integration_unusual_three(self):
lc = LanguageChecker.LanguageChecker()
result = lc.checkLanguage("")
lc = LanguageChecker.Checker()
result = lc.check("")
self.assertEqual(result, False)
def test_integration_unusual_four(self):
lc = LanguageChecker.LanguageChecker()
result = lc.checkLanguage(".")
lc = LanguageChecker.Checker()
result = lc.check(".")
self.assertEqual(result, False)
def test_integration_unusual_five(self):
lc = LanguageChecker.LanguageChecker()
result = lc.checkLanguage("#")
lc = LanguageChecker.Checker()
result = lc.check("#")
self.assertEqual(result, False)
def test_integration_unusual_7(self):
lc = LanguageChecker.LanguageChecker()
result = lc.checkLanguage(
lc = LanguageChecker.Checker()
result = lc.check(
"999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999"
)
self.assertEqual(result, False)
def test_integration_unusual_7(self):
lc = LanguageChecker.LanguageChecker()
result = lc.checkLanguage("")
lc = LanguageChecker.Checker()
result = lc.check("")
self.assertEqual(result, False)
def test_integration_addition(self):
"""
Makes sure you can add 2 lanuggae objecs together
"""
lc = LanguageChecker.LanguageChecker()
result = lc.checkLanguage("hello my darling")
lc = LanguageChecker.Checker()
result = lc.check("hello my darling")
lc2 = LanguageChecker.LanguageChecker()
result = lc.checkLanguage("sad as dasr as s")
lc2 = LanguageChecker.Checker()
result = lc.check("sad as dasr as s")
temp = lc.getChiScore()
temp2 = lc2.getChiScore()
@ -132,6 +132,6 @@ class testIntegration(unittest.TestCase):
Bug is that chi squared does not score this as True
"""
text = """Charles Babbage, FRS (26 December 1791 - 18 October 1871) was an English mathematician, philosopher, inventor and mechanical engineer who originated the concept of a programmable computer. Considered a "father of the computer", Babbage is credited with inventing the first mechanical computer that eventually led to more complex designs. Parts of his uncompleted mechanisms are on display in the London Science Museum. In 1991, a perfectly functioning difference engine was constructed from Babbage's original plans. Built to tolerances achievable in the 19th century, the success of the finished engine indicated that Babbage's machine would have worked. Nine years later, the Science Museum completed the printer Babbage had designed for the difference engine."""
lc = LanguageChecker.LanguageChecker()
result = lc.checkLanguage(text)
lc = LanguageChecker.Checker()
result = lc.check(text)
self.assertEqual(result, True)

417
tests/speed_test.py Normal file
View File

@ -0,0 +1,417 @@
"""
TL;DR
Tested over 20,000 times
Maximum sentence size is 15 sentences
1/2 chance of getting 'gibberish' (encrypted text)
1/2 chance of getting English text
Each test is timed using Time module.
The accuracy is calculated as to how many true positives we get over the entire run
"""
import spacy
import random
import time
from statistics import mean
import ciphey
import enciphey
from alive_progress import alive_bar
from spacy.lang.en.stop_words import STOP_WORDS
import cipheydists
import cipheycore
import pprint
from math import ceil
class tester:
def __init__(self):
self.nlp = spacy.load("en_core_web_sm")
self.f = open("hansard.txt", encoding="ISO-8859-1").read()
self.f = self.f.split(".")
# self.analysis = cipheycore.start_analysis()
# for word in self.f:
# cipheycore.continue_analysis(self.analysis, word)
# cipheycore.finish_analysis(self.analysis)
self.enciph = enciphey.encipher()
# all stopwords
self.all_stopwords = set(self.nlp.Defaults.stop_words)
self.top1000Words = cipheydists.get_list("english1000")
self.wordlist = cipheydists.get_list("english")
self.endings = set(
[
"al",
"y",
"sion",
"tion",
"ize",
"ic",
"ious",
"ness",
"ment",
"ed",
"ify",
"ence",
"fy",
"less",
"ance",
"ship",
"ate",
"dom",
"ist",
"ish",
"ive",
"en",
"ical",
"ful",
"ible",
"ise",
"ing",
"ity",
"ism",
"able",
"ty",
"er",
"or",
"esque",
"acy",
"ous",
]
)
self.endings_3_letters = list(filter(lambda x: len(x) > 3, self.endings))
self.best_thresholds = {
"word endings": {
1: {"Threshold": 0, "Accuracy": 0},
2: {"Threshold": 0, "Accuracy": 0},
3: {"Threshold": 0, "Accuracy": 0},
4: {"Threshold": 0, "Accuracy": 0},
5: {"Threshold": 0, "Accuracy": 0},
},
"word endngs with just 3 chars": {
1: {"Threshold": 0, "Accuracy": 0},
2: {"Threshold": 0, "Accuracy": 0},
3: {"Threshold": 0, "Accuracy": 0},
4: {"Threshold": 0, "Accuracy": 0},
5: {"Threshold": 0, "Accuracy": 0},
},
"stop words": {
1: {"Threshold": 0, "Accuracy": 0},
2: {"Threshold": 0, "Accuracy": 0},
3: {"Threshold": 0, "Accuracy": 0},
4: {"Threshold": 0, "Accuracy": 0},
5: {"Threshold": 0, "Accuracy": 0},
},
"check 1000 words": {
1: {"Threshold": 0, "Accuracy": 0},
2: {"Threshold": 0, "Accuracy": 0},
3: {"Threshold": 0, "Accuracy": 0},
4: {"Threshold": 0, "Accuracy": 0},
5: {"Threshold": 0, "Accuracy": 0},
},
"checker": {
1: {"Threshold": 0, "Accuracy": 0},
2: {"Threshold": 0, "Accuracy": 0},
3: {"Threshold": 0, "Accuracy": 0},
4: {"Threshold": 0, "Accuracy": 0},
5: {"Threshold": 0, "Accuracy": 0},
},
}
# text = "hello my name is Bee and I really like flowers"
# def checker(self, text: str, threshold: float, text_length: int) -> bool:
# x = self.checker(text=text, threshold=0.55, text_length=len(text))
def lem(self, text, thresold):
sentences = self.nlp(text)
return set([word.lemma_ for word in sentences])
def stop(self, text, threshold):
for word in text:
if word in self.all_stopwords:
return True
else:
return False
# x = [word for word in text if not word in self.all_stopwords]
# return True if len(x) < len(text) else False
def check1000Words(self, text, threshold):
"""Checks to see if word is in the list of 1000 words
the 1000words is a dict, so lookup is O(1)
Args:
text -> The text we use to text (a word)
Returns:
bool -> whether it's in the dict or not.
"""
# If we have no wordlist, then we can't reject the candidate on this basis
if text is None:
return False
# If any of the top 1000 words in the text appear
# return true
for word in text:
# I was debating using any() here, but I think they're the
# same speed so it doesn't really matter too much
if word in self.top1000Words:
return True
return False
def get_random_sentence(self, size):
# if random.randint(0, 1) == 0:
# x = None
# while x is None:
# x = (True, " ".join(random.sample(self.f, k=random.randint(1, size))))
# return x
# else:
# x = None
# while x is None:
# x = self.enciph.getRandomEncryptedSentence(size)
# x = x["Encrypted Texts"]["EncryptedText"]
# return (False, x)
x = (True, " ".join(random.sample(self.f, k=random.randint(1, size))))
return x
def get_words(self, text):
doc = self.nlp(text)
toReturn = []
for token in doc:
toReturn.append((token.text).lower())
return toReturn
def word_endings(self, text, thresold):
total = len(text)
if total == 0:
return False
positive = 0
# as soon as we hit 25%, we exit and return True
for word in text:
for word2 in self.endings:
if word.endswith(word2):
positive += 1
# if total / positive >= 0.25:
# return True
# return False
if positive == 0:
return False
return True if positive / total > thresold else False
def word_endings_3(self, text, threshold):
"""Word endings that only end in 3 chars, may be faster to compute"""
positive = 0
total = len(text)
if total == 0:
return False
for word in text:
if word[::-3] in self.endings_3_letters:
positive += 1
if positive != 0:
return True if total / positive > threshold else False
else:
return False
# Now to time it and take measurements
def perform(self, function, sent_size, threshold):
threshold = threshold / 100
# calculate accuracy
total = 0
true_positive_returns = 0
true_negative_returns = 0
false_positive_returns = 0
false_negatives_returns = 0
# calculate aveager time
time_list = []
# average sentance size
sent_size_list = []
test_range = 200
for i in range(0, test_range):
sent = self.get_random_sentence(sent_size)
text = sent[1]
truthy = sent[0]
sent_size_list.append(len(text))
# should be length of chars
text = self.get_words(text)
old = len(text)
# timing the function
# def checker(self, text: str, threshold: float, text_length: int, var: set) -> bool:
tic = time.perf_counter()
result = function(text=text, threshold=threshold, text_length=old)
tok = time.perf_counter()
# new = len(result)
# print(
# f"The old text is \n {''.join(text)}\n and the new text is \n {''.join(result)} \n\n"
# )
# result = new < old
# checking for accuracy
# new = len(new)
# the and here means we only count True Positives
# result = new < old
if result and truthy:
true_positive_returns += 1
elif result:
false_positive_returns += 1
elif not result and truthy:
false_negatives_returns += 1
elif not result:
true_negative_returns += 1
else:
print("ERROR")
total += 1
# appending the time
t = tok - tic
time_list.append(t)
print(
f"The accuracy is {str((true_positive_returns+true_negative_returns) / total)} \n and the time it took is {str(mean(time_list))}. \n The average string size was {str(mean(sent_size_list))}"
)
print(
f"""
Positive Negative
Positive {true_positive_returns} {false_positive_returns}
Negative {false_negatives_returns} {true_negative_returns}
"""
)
return {
"Name": function,
"Threshold": threshold,
"Accuracy": (true_positive_returns + true_negative_returns) / total,
"Average_time": mean(time_list),
"Average_string_len": mean(sent_size_list),
"Sentence length": sent_size,
"confusion_matrix": [
[true_positive_returns, false_positive_returns],
[false_negatives_returns, true_negative_returns],
],
}
def perform_3_sent_sizes(self, threshold):
"""
Gives us the average accuracy and time etc
"""
# funcs = [obj.checker, obj.stop, obj.check1000Words]
funcs = [obj.checker]
# funcs = [obj.word_endings]
names = [
"checker",
# "stop words",
# "check 1000 words",
]
# names = ["checker"]
sent_sizes = [1, 2, 3, 4, 5]
x = {
# "stop words": {1: None, 2: None, 3: None, 4: None, 5: None, 20: None},
# "check 1000 words": {1: None, 2: None, 3: None, 4: None, 5: None, 20: None},
"checker": {1: None, 2: None, 3: None, 4: None, 5: None, 20: None},
}
for i in range(0, len(funcs)):
func = funcs[i]
for y in sent_sizes:
# print("Hello this runsss")
x[names[i]][y] = self.perform(func, y, threshold)
return x
def perform_best_percentages(self):
"""
Tells us the optimal percentage thresholds
"""
"""
TODO I need to record thresholds for each length of text
"""
# "word endings with just 3 chars": {
# "Sentence Size": {"Threshold": 0, "Accuracy": 0}
# },
# "stop words": {"Sentence Size": {"Threshold": 0, "Accuracy": 0}},
# "check 1000 words": {"Sentence Size": {"Threshold": 0, "Accuracy": 0}},
# }
items = range(100)
with alive_bar(len(items)) as bar:
for i in range(1, 101):
x = self.perform_3_sent_sizes(threshold=i)
pprint.pprint(x)
for key, value in x.items():
# getting max keyLs
for y in [1, 2, 3, 4, 5]:
pprint.pprint(x[key])
# size = x[key][y]
size = y
# print(f"**** Size is {size}")
temp1 = x[key][y]["Accuracy"]
# print(f"Accuracy is {temp1}")
temp2 = self.best_thresholds[key][size]["Accuracy"]
if temp1 > temp2:
temp2 = temp1
# print(f"Self best is {self.best_thresholds[key][size]}")
self.best_thresholds[key][size]["Threshold"] = i
self.best_thresholds[key][size]["Accuracy"] = temp1
pprint.pprint(x)
bar()
pprint.pprint(self.best_thresholds)
def calculate_average_sentence_size(self):
sent_sizes = [1, 2, 3, 4, 5]
lengths = []
for x in sent_sizes:
for i in range(0, 2000):
y = self.get_random_sentence(x)
lengths.append(len(y[1]))
print(f"{x} : The mean is {mean(lengths)}")
def checker(self, text: str, threshold: float, text_length: int) -> bool:
"""Given text determine if it passes checker
The checker uses the vairable passed to it. I.E. Stopwords list, 1k words, dictionary
Args:
text -> The text to check
threshold -> at what point do we return True? The percentage of text that is in var before we return True
text_length -> the length of the text
var -> the variable we are checking against. Stopwords list, 1k words list, dictionray list.
Returns:
boolean -> True for it passes the test, False for it fails the test."""
percent = ceil(text_length * threshold)
meet_threshold = 0
location = 0
end = percent
while location <= text_length:
# chunks the text, so only gets THRESHOLD chunks of text at a time
to_analyse = text[location:end]
for word in to_analyse:
# if len(word) <= 1:
# continue
# if word is a stopword, + 1 to the counter
if word in self.wordlist:
meet_threshold += 1
if meet_threshold / text_length >= threshold:
# if we meet the threshold, return True
# otherwise, go over again until we do
# We do this in the for loop because if we're at 24% and THRESHOLD is 25
# we don't want to wait THRESHOLD to return true, we want to return True ASAP
return True
location += 1
return False
obj = tester()
# X = obj.perform_3_sent_sizes(50)
# x = obj.perform_best_percentages()
x = obj.calculate_average_sentence_size()

View File

@ -1,12 +1,10 @@
import sys
from ciphey.LanguageChecker.brandon import Brandon
from ciphey.basemods.Checkers.brandon import Brandon
from ciphey.Decryptor.Encoding.encodingParent import EncodingParent
from ciphey.__main__ import make_default_config
import unittest
from loguru import logger
import cipheydists
config = make_default_config("")

View File

@ -1,4 +1,4 @@
from ciphey.neuralNetworkMod.nn import NeuralNetwork
from ciphey.basemods.Decoder.nn import NeuralNetwork
import numpy
import unittest

16
winget.yml Normal file
View File

@ -0,0 +1,16 @@
Id: Ciphey.Ciphey
Publisher: Ciphey
Name: Ciphey
Version: 5
AppMoniker: Ciphey
MinOSVersion: 10.0.0.0
Description: Automated Decryption Tool
Homepage: https://www.github.com/ciphey/ciphey
License: MIT
LicenseUrl: https://opensource.org/licenses/MIT
InstallerType: exe
Installers:
- Arch: x84
Url: https://statics.teams.cdn.office.net/production-windows-x64/1.3.00.4461/Teams_windows_x64.exe
Sha256: 712f139d71e56bfb306e4a7b739b0e1109abb662dfa164192a5cfd6adb24a4e1
ManifestVersion: 0.1.0