GaussDB(DWS) 8.2.0 and later provides a built-in cryptographic module pgcrypto. The pgcrypto module allows database users to store certain columns of data after encryption, enhancing sensitive data security. Users without the encryption key cannot read the encrypted data stored in GaussDB(DWS).
The pgcrypto function runs inside database servers, which means that all data and passwords are transmitted in plaintext between pgcrypto and client applications. For security purposes, you are advised to use the SSL connection between the client and the GaussDB(DWS) server.
The functions in the pgcrypto module are as follows.
digest(data text, type text) returns bytea digest(data bytea, type text) returns bytea
data indicates the original data, and type indicates the encryption algorithm (md5, sha1, sha224, sha256, sha384, sha512, or sm3). The return value of the function is a binary string.
Example:
Use the digest() function to encrypt the GaussDB(DWS) string using SHA256 for storage.
select digest('GaussDB(DWS)', 'sha256');
digest
--------------------------------------------------------------------
\xcc2d1b97c6adfba44bbce7386516f63f16fc6e6a10bd938861d3aba501ac8aab
(1 row)
hmac(data text, key text, type text) returns bytea hmac(data bytea, key bytea, type text) returns bytea
data indicates the original data, key indicates the encryption key, and type indicates the encryption algorithm (md5, sha1, sha224, sha256, sha384, sha512, or sm3). The return value of the function is a binary string.
Example:
Use key123 and the SHA256 algorithm to calculate the MAC value for the string GaussDB(DWS).
select hmac('GaussDB(DWS)', 'key123', 'sha256');
hmac
--------------------------------------------------------------------
\x14e1d9e110e9b11ab8379dc02b49533d50a6f4deafe6d6cd451d06c106c97d83
(1 row)
If both the original data and its encryption result are modified, the digest() function cannot identify the changes. The hmac() function can identify the changes as long as the key is not disclosed.
If the key is longer than the hash block, it will be hashed first, and the hash result will be used as the key.
The crypt() and gen_salt() functions are used for password hashing. crypt() executes hashes to encrypt data, and gen_salt() generates salted hashes.
The algorithms in crypt() differ from the common MD5 and SHA1 hash algorithms in the following aspects:
The following table lists the algorithms supported by the crypt() function.
Algorithm |
Maximum Password Length |
Adaptability |
Salt Bits |
Standard Output Length |
Description |
|---|---|---|---|---|---|
bf |
72 |
√ |
128 |
60 |
Blowfish-based 2a variation |
md5 |
unlimited |
× |
48 |
34 |
MD5-based algorithm |
xdes |
8 |
√ |
24 |
20 |
Extended DES |
des |
8 |
× |
12 |
13 |
Native UNIX algorithm |
This function returns a hash value of the password string in crypt(3) format. The salt parameter is generated by the gen_salt() function.
For the same password, the crypt() function returns a different result each time, because the gen_salt() function generates a different salt each time. During password verification, the previously generated hash result can be used as the salt.
For example, to set a new password, run the following command:
UPDATE ... SET pswhash = crypt('new password', gen_salt('bf',10));
The hash values of the entered password and the stored password are compared.
SELECT (pswhash = crypt('entered password', pswhash)) AS pswmatch FROM ... ;
If the entered password is correct, true is returned.
Example:
create table userpwd(userid int8, pwd text);
CREATE TABLE
insert into userpwd values (1, crypt('this is a pwd', gen_salt('bf',10)));
INSERT 0 1
select crypt('this is a pwd', pwd)=pwd as result from userpwd where userid =1;
result
--------
t
(1 row)
select crypt('this is a wrong pwd', pwd)=pwd as result from userpwd where userid =1;
result
--------
f
(1 row)
gen_salt(type text [, iter_count integer ]) returns text
This function generates a random salt string each time. The string determines the algorithm used by the crypt() function. The type parameter specifies a hash algorithm (des, xdes, md5, or bf) for generating a string. For the xdes and bf algorithms, iter_count indicates the number of iterations. A large value indicates a long encryption or cracking time.
1 2 3 4 5 | SELECT gen_salt('des'), gen_salt('xdes'), gen_salt('md5'), gen_salt('bf'); gen_salt | gen_salt | gen_salt | gen_salt ----------+-----------+-------------+------------------------------- qh | _J9..uEUi | $1$SNgqyKAi | $2a$06$B/Etc3J8zYBV49LrDU97MO (1 row) |
The salt generated by an algorithm has a fixed format. For example, in $2a$06$ in the bf algorithm result, 2a indicates the 2a variation of Blowfish, and 06 indicates the number of iterations. If iter_count is ignored, the default number of iterations will be used. The valid iter_count values depend on the algorithm used, as shown in the table below. For the xdes algorithm, the number of iterations must be an odd number.
Algorithm |
Default Value |
Min. |
Max. |
|---|---|---|---|
xdes |
725 |
1 |
16777215 |
bf |
6 |
4 |
31 |
The PGP encryption function of GaussDB(DWS) complies with the OpenPGP (RFC 4880) standard, which includes requirements for symmetric key (private key) encryption and asymmetric key (public key) encryption.
An encrypted PGP message consists of the following parts:
For symmetric key (password) encryption:
For public key encryption:
In either case, the data encryption process is as follows:
Supported PGP encryption functions
Description: Encrypts a symmetric key.
pgp_sym_encrypt(data text, psw text [, options text ]) returns bytea pgp_sym_encrypt_bytea(data bytea, psw text [, options text ]) returns bytea
data indicates the data to be encrypted, psw indicates the PGP symmetric key, and options is used to set options. For details, see Table 3.
Description: Decrypts a message encrypted using a PGP symmetric key.
pgp_sym_decrypt(msg bytea, psw text [, options text ]) returns text pgp_sym_decrypt_bytea(msg bytea, psw text [, options text ]) returns bytea
msg indicates the data to be decrypted, psw indicates the PGP symmetric key, and options is used to set options. For details, see Table 3. To avoid generating invalid characters, you are not allowed to use the pgp_sym_decrypt function to decrypt bytea data. You can use the pgp_sym_decrypt_bytea function instead.
Description: Encrypts a public key.
pgp_pub_encrypt(data text, key bytea [, options text ]) returns bytea pgp_pub_encrypt_bytea(data bytea, key bytea [, options text ]) returns bytea
data indicates the data to be encrypted. key indicates the PGP public key. If a private key is used as input, an error will be returned. options is used to set options. For details, see Table 3.
Description: Decrypts a message encrypted using a PGP public key.
pgp_pub_decrypt(msg bytea, key bytea [, psw text [, options text ]]) returns text pgp_pub_decrypt_bytea(msg bytea, key bytea [, psw text [, options text ]]) returns bytea
You can decrypt a message encrypted using a public key. The key must be the private key corresponding to the public key used for encryption. If the private key is password protected, specify the password in psw. If you have not specified any password but want to specify this option now, provide an empty password.
To avoid generating invalid characters, you are not allowed to use the pgp_pub_decrypt function to decrypt bytea data. You can use pgp_pub_decrypt_bytea function instead.
The key must be the private key corresponding to the public key used for encryption. If the private key is password protected, specify the password in psw. If you have not specified any password but want to specify this option now, provide an empty password. The options parameter is used to set options. For details, see Table 3.
Description: Extracts the key ID of the PGP public or private key. If an encrypted message is used as the input, the ID of the key used to encrypt the message will be returned.
pgp_key_id(bytea) returns text
This function can return two special key IDs:
Different keys may have the same ID. This situation rarely occurs. In this case, the client application needs to try different keys for decryption, in the same way it deals with ANYKEY.
Description: Converts binary data into PGP ASCII-armor format by the CRC calculation and formatting of a Base64 string.
Syntax:
armor(data bytea [ , keys text[], values text[] ]) returns text
Description: Performs the reverse conversion.
dearmor(data text) returns bytea
Converts the encrypted data bytea to the PGP ASCII-armor format, or the other way around.
data indicates the data to be converted. If multiple pairs of keys and values are specified, an armor header will be generated for each key-value pair and added to the output. The two arrays are both one-dimensional arrays with the same length, and cannot contain non-ASCII characters.
pgp_armor_headers(data text, key out text, value out text) returns setof record
The return result is a data row set consisting of key and value columns. Any non-ASCII characters contained in the set are regarded as UTF-8 characters.
Using GnuPG to generate PGP keys
Generate a key.
gpg --gen-key
DSA and Elgamal keys are recommended.
To use an RSA key, you must create a DSA or RSA key as the master key used only for signature, and then specify gpg --edit-key to add an RSA encryption subkey.
List keys.
gpg --list-secret-keys
Export a public key in ASCII-protected format.
gpg -a --export KEYID > public.key
Export a private key in ASCII-protected format.
gpg -a --export-secret-keys KEYID > secret.key
Before using these keys as the input to the PGP function, run dearmor() on them. Alternatively, if you can process binary data, remove -a from the command.
The PGP encryption function has the following restrictions:
PGP function parameters
The option names in the pgcrypto function are similar to those in the GnuPG function. Option values are set using equal signs (=), and the options are separated by commas (,). Example:
pgp_sym_encrypt(data, psw, 'compress-algo=1, cipher-algo=aes256')
Options other than convert-crlf can be used only for encryption functions. The decryption function obtains parameters from PGP data.
The most common options are compress-algo and unicode-mode. You can retain the default values for other options.
Option |
Description |
Default Value |
Value |
Function |
|---|---|---|---|---|
cipher-algo |
Cryptographic algorithm |
aes128 |
bf, aes128, aes192, aes256, 3des, cast5 |
pgp_sym_encrypt, pgp_pub_encrypt |
compress-algo |
Compression algorithm |
0 |
|
pgp_sym_encrypt, pgp_pub_encrypt |
compress-level |
Compression level. A high level indicates the compression will be slow, but the data size after compression will be small. 0 disables compression. |
6 |
0, 1-9 |
pgp_sym_encrypt, pgp_pub_encrypt |
convert-crlf |
Indicates whether to convert \n to \r\n during encryption, and whether to convert \r\n to \n during decryption. RFC4880 requires that \r\n must be used as the newline character in text data storage. |
0 |
0, 1 |
pgp_sym_encrypt, pgp_pub_encrypt, pgp_sym_decrypt, pgp_pub_decrypt |
disable-mdc |
SHA-1 is not used to protect data. It is used only for compatibility with old PGP products. |
0 |
0, 1 |
pgp_sym_encrypt, pgp_pub_encrypt |
sess-key |
A separate session key is used. Public key encryption always uses a separate session key. This option is used for symmetric key encryption, which directly uses the S2K key by default. |
0 |
0, 1 |
pgp_sym_encrypt |
s2k-mode |
S2K algorithm |
3 |
|
pgp_sym_encrypt |
s2k-count |
Number of iterations of the S2K algorithm |
A random value between 65,536 and 253,952. |
1024 ≤ Value ≤ 65,011,712 |
pgp_sym_encrypt and s2k-mode=3 |
s2k-digest-algo |
Digest algorithm used during S2K calculation |
sha1 |
md5, sha1 |
pgp_sym_encrypt |
s2k-cipher-algo |
Password used to encrypt a separate session key |
cipher-algo algorithm |
bf, aes, aes128, aes192, aes256 |
pgp_sym_encrypt |
unicode-mode |
Whether to convert text data between database internal encoding and UTF-8. If the database already uses UTF-8 encoding, no conversion will be performed, but the message will be marked as UTF-8. If this parameter is not specified, the message will not be marked. |
0 |
0, 1 |
pgp_sym_encrypt, pgp_pub_encrypt |
Raw encryption functions only run a cipher over data. They don't have any advanced features of PGP encryption. Therefore they have the following problems:
With the introduction of PGP encryption, these raw encryption functions are not recommended.
encrypt(data bytea, key bytea, type text) returns bytea decrypt(data bytea, key bytea, type text) returns bytea encrypt_iv(data bytea, key bytea, iv bytea, type text) returns bytea decrypt_iv(data bytea, key bytea, iv bytea, type text) returns bytea
data indicates the data to be encrypted, and type indicates the encryption/decryption method. The syntax of the type parameter is as follows:
algorithm [ - mode ] [ /pad: padding ]
The options of algorithm are as follows:
The options of mode are as follows:
The options of padding are as follows:
For example, the encryption results of the following functions are the same:
encrypt(data, 'fooz', 'bf') encrypt(data, 'fooz', 'bf-cbc/pad:pkcs')
For the encrypt_iv and decrypt_iv functions, the iv parameter indicates the initial value for the CBC mode. This parameter is ignored for ECB. It is truncated or padded with zeroes if not exactly block size. It defaults to all zeroes in the functions without this parameter.
gen_random_bytes(count integer) returns bytea
count indicates the number of returned bytes. The value range is 1 to 1024.
Example:
SELECT gen_random_bytes(16);
gen_random_bytes
------------------------------------
\x1f1eddc11153afdde0f9e1229f8f4caf
(1 row)
SELECT gen_random_uuid(); gen_random_uuid -------------------------------------- 2bd664a2-b760-4859-8af6-8d09ccc5b830