Jump to content

My own encryption technique made in Python featuring RSA.

Poet129

Code:

Spoiler
from Crypto.PublicKey import RSA
from Crypto.Cipher import AES
from os import remove
from time import time
from sys import argv
def ifb(b):
    return int.from_bytes(b,'big')
def itb(i,l=None):
    if l==None:
        return int(i).to_bytes((i.bit_length()+7)//8,'big')
    else:
        return int(i).to_bytes(l,'big')
def trap(d,key,kl,loc):
    aesk=AES.new(d[:32],AES.MODE_CBC,iv=d[32:48])
    aes=aesk.encrypt(d[loc:])
    return itb(pow(ifb(d[:loc]+aes[:kl-loc]),key.e,key.n),kl)+aes[kl-loc:]
def door(d,key,kl,loc):
    add=itb(pow(ifb(d[:kl]),key.d,key.n),kl)
    aesk=AES.new(add[:32],AES.MODE_CBC,iv=add[32:48])
    return add[:loc]+aesk.decrypt(add[loc:kl]+d[kl:])
if __name__ == '__main__':
    start=time()
    try:
        filename=argv[1]
    except IndexError:
        filename=input('What file would you like to process?: ')
    try:
        keyfile=argv[2]
    except IndexError:
        keyfile=input('What Key file would you like to use to process?: ')
    try:
        key=RSA.import_key(open(keyfile,'rb').read())
    except FileNotFoundError:
        print('Key file not found.')
        exit()
    kl=key.size_in_bytes()
    c=open(filename,'rb').read()
    loc=48+(len(c)-len(c)//16*16)
    if filename[-5:]!='.aenc':
        ed=trap(c,key,kl,loc)
        open(filename+'.aenc','wb').write(ed)
    else:
        dd=door(c,key,kl,loc)
        open(filename[:-5],'wb').write(dd)
    remove(filename)
    print('Processing took '+str(int(time()-start))+' seconds to complete.')

 

Usage:

python AEncrypt.py Filename KeyFilename

To encrypt or decrypt is determined by using a custom extension '.aenc'.

Encryption can be achieved using a private or public key, while only a private key can decrypt. Note using the public key over a private key for encryption is faster, however both have the same output.

 

How it works:

Effectively RSA+AES without padding.

 

Benchmark:

I used a 100MB (100,000,000 Bytes) /dev/urandom file generated using dd.

Measured in seconds:

20,000 bit RSA: Public Key Encryption: 2 Private Key Encryption: 18 Decryption: 40.

4,096 bit RSA: Public Key Encryption: 1 Private Key Encryption: 2 Decryption: 2.

 

Pros:

Maximum output size will match input data size.*

Much faster than RSA, while providing asymmetric encryption.

Space savings compared to something like traditional RSA+AES.

 

Cons:

*Minimum input size should match key size.

If the first key length bytes of data are leaked as unencrypted the rest of the data is no longer protected.

Performs no checks on whether the data is correct after decryption. A check could be added by adding consistent data to the end of the file or right after the key encrypted data.

Link to comment
Share on other sites

Link to post
Share on other sites

From the python docs (random module)

Warning: The pseudo-random generators of this module should not be used for security purposes. For security or cryptographic uses, see the secrets module.

 

1 hour ago, Poet129 said:

It first seeds the python random number generator with the first bytes up to the key length of the read RSA key.

So RSA is only used here to generate keys and seed the non cryptographically secure random number generator.

 

1 hour ago, Poet129 said:

Using the seeded generator a random number of the size of the rest of the data is made. Then the random number is used to offset the rest of the data.

So this is basically a Caesar cipher where each byte has it's own offset, and this offset is coming from the non cryptographically secure random number generator.

 

So the main problem here is that python uses the Mersenne Twister as the core generator,  which is reversible. The cipher is not a strong one either.

Edited by shadow_ray
cypher vs cipher.. I'm dumb

ಠ_ಠ

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, shadow_ray said:

So RSA is only used here to generate keys and seed the non cryptographically secure random number generator.

The RSA is used to protect the seed used in the random number generator. The seed itself is derived from the beginning of the data up to the length of the key.

2 hours ago, shadow_ray said:

From the python docs (random module) Warning: The pseudo-random generators of this module should not be used for security purposes. For security or cryptographic uses, see the secrets module.

So this is basically a Caesar cipher where each byte has its own offset, and this offset is coming from the non cryptographically secure random number generator.

No, the offset of each byte is calculated by the byte in the same place in the random generated bytes. It is similar to the Caesar cipher but it doesn't have a consistent mapping for a byte of a particular value instead of the same place.

2 hours ago, shadow_ray said:

So the main problem here is that python uses the Mersenne Twister as the core generator,  which is reversible. The cypher is not a strong one either.

Are you saying that the random number generator is reversible without the seed or excessive guessing?

 

After reading a few articles it appears as if this is correct. I've not demonstrated this myself however. It requires that you seed 624 different instances (files) with the same seed. The likelihood that the first 512 or 2,500 (or other depending on key size) bytes are going to be the same across 624 different files is fairly unlikely. However not impossible... I'll be looking for another seedable random number generator that doesn't have such weaknesses. Based on the documentation the secrets module doesn't appear to be seedable.

 

As a temporary fix I've now included the encrypted key size bytes in the calculation of the seed. So both the unencrypted and encrypted versions are added together for the seed. This should limit the 624 different files to also needing to be encrypted with the same key.

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, Poet129 said:

The RSA is used to protect the seed used in the random number generator. The seed itself is derived from the beginning of the data up to the length of the key.

Ohh I see.. The beginning of the data, up to the length of the key, is encrypted using RSA.

aed = itb(pow(ifb(d[:kl]), key.e, key.n), kl)
sd = pow(ifb(d[:kl]), key.d, key.n)

This means that the first part of the message is well protected (given that there are no bugs in the code) but the rest is not.

 

2 hours ago, Poet129 said:

It requires that you seed 624 different instances (files) with the same seed.

No. My understanding is that the internal state of the generator can be determined by obtaining 624 32bit integers (2 496 bytes) from the pseudo random sequence. This means that if someone knows 2496bytes from the offset array the rest of the array can be calculated easily. With some compute power even less is enough. Furthermore, if the data follows some patterns (same values next to each other, natural language etc) then the offsets can be guessed more easily.

 

Heck, according to this article https://www.ambionics.io/blog/php-mt-rand-prediction knowing 2 ints from the sequence is enough to crack the generator without any brute force.

ಠ_ಠ

Link to comment
Share on other sites

Link to post
Share on other sites

4 minutes ago, shadow_ray said:

Ohh I see.. The beginning of the data, up to the length of the key, is encrypted using RSA.

aed = itb(pow(ifb(d[:kl]), key.e, key.n), kl)
sd = pow(ifb(d[:kl]), key.d, key.n)

This means that the first part of the message is well protected (given that there are no bugs in the code) but the rest is not.

 

No. My understanding is that the internal state of the generator can be determined by obtaining 624 32bit integers (2 496 bytes) from the pseudo random sequence. This means that if someone knows 2496bytes from the offset array the rest of the array can be calculated easily. With some compute power even less is enough. Furthermore, if the data follows some patterns (same values next to each other, natural language etc) then the offsets can be guessed more easily.

 

Heck, according to this article https://www.ambionics.io/blog/php-mt-rand-prediction knowing 2 ints from the sequence is enough to crack the generator without any brute force.

I'm looking into ISAAC. It looks promising. Anything wrong with it apart from seeming to cap at 1024 bytes rather than 2500 in terms of total seeds?

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, Poet129 said:

I'm looking into ISAAC. It looks promising. Anything wrong with it apart from seeming to cap at 1024 bytes rather than 2500 in terms of total seeds?

I have no idea. You should probably ask this question on a cyber sec forum to reach more people who actually knows this stuff 🙂

ಠ_ಠ

Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, shadow_ray said:

I have no idea. You should probably ask this question on a cyber sec forum to reach more people who actually knows this stuff 🙂

Well, I don't think moving to ISAAC is gonna happen it took 295 and 184 seconds to encrypt and decrypt 100MB.

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, Poet129 said:

Well, I don't think moving to ISAAC is gonna happen it took 295 and 184 seconds to encrypt and decrypt 100MB.

Added multiprocessing and optimized ISAAC by splitting into 3072 byte chunks. Currently 55, 66 seconds for encrypt and decrypt of 100MB.

Link to comment
Share on other sites

Link to post
Share on other sites

14 hours ago, Poet129 said:

Much faster than RSA

Doubtful considering you're literally just using an RSA library. The security of your method is only (at best) as good as the RSA encryption of the seed, likely much worse since the cypher you're using probably doesn't hold a candle to RSA encryption and there's no real reason to try and get the seed if you can just brute force the cypher. You compared this to 4096 bit RSA encryption without showing any evidence that it offers comparable security.

 

Considering your post history it seems like your approach to cryptographic "research" is to combine a radom assortment of methods you found on the internet without doing the due diligence of figuring out if and why it might be better or worse than common solutions. That's not how this works. In cryptography the math comes first.

Don't ask to ask, just ask... please 🤨

sudo chmod -R 000 /*

Link to comment
Share on other sites

Link to post
Share on other sites

8 hours ago, Poet129 said:

Added multiprocessing and optimized ISAAC by splitting into 3072 byte chunks. Currently 55, 66 seconds for encrypt and decrypt of 100MB.

Note that you should not be using RSA for encrypting 100 mb of data. RSA is meant for really small datasets as it's very slow the bigger the data. AES should be used for large dataset and streaming.

Link to comment
Share on other sites

Link to post
Share on other sites

On 6/7/2022 at 6:32 PM, Poet129 said:

Much faster than RSA, while providing asymmetric encryption.

That doesn't mean much, unless your algorithm provides at least the same level of security. Since you're using simple substitution with a random number generator that is documented as "completely unsuitable for cryptographic purposes" I dare say that is not the case. The full quote from Python's documentation is:

 

Quote

However, being completely deterministic, it is not suitable for all purposes, and is completely unsuitable for cryptographic purposes.

 

Being deterministic means you can predict which future values are going to be produced by the algorithm. Given the encrypted file it might be possible to perform a type of statistical analysis on it that reveals the sequence of random numbers, which could then be used to reverse engineer the seed. Even more so if the attacker has a rough idea what is contained in the file (e.g. I know it's ZIP, so the first few bytes have to be a ZIP header).

 

For example, one of the reasons the allies were able to crack Enigma's cipher in WW2 was that every message ended with "H**l H****r". They couldn't read it, but they knew that the last few encrypted letters of the message must translate to this, which gave them insight into the message's cipher.

 

Some security related algorithms are also intentionally designed to be slow. This makes them harder to brute force, because attackers can't try every possible value without spending a lot of time. The faster your algorithm, the faster an attacker can go through every possible option. Especially these days with GPUs that can do thousands of operations in parallel per second.

 

To give you another example: Skype calls were "hacked", not because it's encryption algorithm was actually broken, but due to the fact that, for speed reasons, packages had a variable length. With some statistical analysis the length of these packages was enough to get a good enough guess which phonemes (word fragments) had generated them. This was enough to reconstruct the conversation.

 

The golden rule of encryption: Don't roll your own (unless you're a mathematician with a strong background in cryptography)

 

What you have created is heading towards a Stream Cipher. Basically: Take a seed to generate a unique, unpredictable sequence of random numbers (not the case here), then use each byte of the sequence to encrypt a byte of the input stream. Have a look here as a starting point: https://crypto.stackexchange.com/questions/64814/choosing-a-random-number-generator-algorithm-to-encrypt-a-message

Remember to either quote or @mention others, so they are notified of your reply

Link to comment
Share on other sites

Link to post
Share on other sites

On 6/8/2022 at 4:08 AM, Sauron said:

Doubtful considering you're literally just using an RSA library. The security of your method is only (at best) as good as the RSA encryption of the seed, likely much worse since the cypher you're using probably doesn't hold a candle to RSA encryption and there's no real reason to try and get the seed if you can just brute force the cypher. You compared this to 4096 bit RSA encryption without showing any evidence that it offers comparable security.

 

Considering your post history it seems like your approach to cryptographic "research" is to combine a random assortment of methods you found on the internet without doing the due diligence of figuring out if and why it might be better or worse than common solutions. That's not how this works. In cryptography the math comes first.

Fair enough.

On 6/8/2022 at 7:38 AM, Franck said:

Note that you should not be using RSA for encrypting 100 mb of data. RSA is meant for really small datasets as it's very slow the bigger the data. AES should be used for large dataset and streaming.

Good Idea.

On 6/8/2022 at 1:50 PM, Eigenvektor said:

To give you another example: Skype calls were "hacked", not because it's encryption algorithm was actually broken, but due to the fact that, for speed reasons, packages had a variable length. With some statistical analysis the length of these packages was enough to get a good enough guess which phonemes (word fragments) had generated them. This was enough to reconstruct the conversation.

Haven't heard of that before...

 

I've updated the code, benchmark, etc. with up to date information. I've managed to get what I was looking for an encryption method without size overhead. I took @Franck and @Sauron's advice and delt away with the random number generator. In its place I've put AES. However to avoid having to add padding to round to the nearest higher length of 16 bytes I overlapped an extra AES encrypt on the last 16 bytes. This last AES encrypt does use the ECB mode which is supposed to be a little less secure however only the last 16 bytes are encrypted with it. For the rest of the file after the RSA key length CBC mode is used. Effectively RSA+AES without padding.

 

Removed need of the ECB mode AES encryption.

Link to comment
Share on other sites

Link to post
Share on other sites

13 hours ago, Poet129 said:

Effectively RSA+AES without padding. 

You that that mixing the 2 wont make it more secure. All it does is making things more complicated to maintain and you introduce an extra point of failure.

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×