-
Support for Batch Processing of Student Files
- Automatically load multiple student Excel files from a specified folder.
- Process each row step by step, and add a new column in the original file to store the encrypted data.
-
Real-Time Row Progress Updates
- Utilize the
tqdm
progress bar library to clearly display file and row processing progress, enhancing user experience.
- Utilize the
-
Automatic Matching of Encryption Seeds
- Extract the year from the first four digits of the student ID and automatically match the corresponding salt value in the
seed
file, ensuring consistency and security in data encryption.
- Extract the year from the first four digits of the student ID and automatically match the corresponding salt value in the
-
Unmatched Data Prompts
- Automatically record student IDs that do not match a salt value and print them out after the task is completed, allowing for easy checking and correction.
-
Optimized Data Structure
- The encrypted column will be inserted before the original column being encrypted (e.g., "student ID"), keeping the table structure clear and logical.
-
Enhanced Error Handling
- If a file fails to be read or saved, it will be skipped, and detailed error information will be output without affecting the processing of other files.
- Install dependencies:
pip install pandas openpyxl tqdm
- Ensure the hash_256.py (Version 1.0) file is in the same directory as the main script.
- Format requirements for seed.xlsx:
- The file must contain two columns: year and seed, representing the year and the corresponding salt value, respectively.
- Place the student Excel files into the specified folder (e.g., ./student/), with the following requirements:
- The files must include the column to be encrypted (e.g., “student ID”).
- If the column to be encrypted is not “student ID,” modify the encrypt_col parameter in batch_process.py.
- Run the script:
python batch_process.py
import hashlib
def desensitize(private_info, seed: str) -> str:
"""
Process sensitive information using a hash algorithm with a salt value.
Parameters:
private_info (any): Sensitive information (e.g., name or ID)
seed (str): Salt value used to enhance hashing security
Returns:
str: Encrypted desensitized information
"""
if not private_info or not seed:
raise ValueError("Private information and salt value cannot be empty")
combined_info = f"{str(private_info)}{seed}"
# Encrypt the combined information using the SHA-256 hash algorithm
hash_object = hashlib.sha256(combined_info.encode('utf-8'))
encrypted_info = hash_object.hexdigest()
return encrypted_info
if __name__ == "__main__":
# Input private information and salt value
private_info = 2021213038
seed = "secure_salt_value_123"
# Call the desensitization function
encrypted_result = desensitize(private_info, seed)
print(f"Original Information: {private_info}")
print(f"Encrypted Information: {encrypted_result}")
Original Information: 2021213038
Encrypted Information: 69cb2854b108c36c078efb6bc1bd08c0ef3768a6e9d6af204a0b7ca18015dbcf
- Hash Algorithm:
- SHA-256 is used, a common and secure one-way hashing algorithm. It produces a fixed-length 64-character hexadecimal string as output.
- Being one-way means the original data cannot be restored from the encrypted result.
- Security Recommendations:
- Ensure the seed is a random and unpredictable string to avoid reduced security due to weak salt values (e.g., dictionary words).
- Input:
- private_info: any (supports str, int, float, bool, list, tuple, dict, set, object)
- seed: str
- Output:
- encrypted_info: str
Below is a table summarizing the theoretical input size limits for common hash algorithms (measured in bits):
Hash Algorithm | Input Length Limit | Output Length (Bits) |
---|---|---|
MD5 | 2³² - 1 bits (approx. 512 MB) | 128 bits |
SHA-1 | 2⁶⁴ - 1 bits (approx. 2 EB) | 160 bits |
SHA-256 | 2⁶⁴ - 1 bits (approx. 2 EB) | 256 bits |
SHA-512 | 2¹²⁸ - 1 bits (virtually infinite) | 512 bits |
- No Practical Input Length Limit: Modern hash algorithms (e.g., SHA-256, SHA-512) have no practical input length restrictions as they process data in chunks.
- Memory Constraints: While the algorithms theoretically accept infinite input lengths, processing very large inputs depends on system memory availability.
- Implementation Constraints: Specific programming libraries or environments may impose additional input size limits.
This implementation combines sensitive data (private_info) with a salt (seed) to generate a secure hash using the SHA-256 algorithm. It supports various data types as input and ensures data security through one-way encryption. The algorithm produces consistent and predictable output, making it suitable for desensitization and data anonymization.