Secure Lookup (Algorithm frameworks)

Secure Lookup is the most commonly used type of algorithm. It is easy to generate and works with different languages. When this algorithm replaces real, sensitive data with fictional data, it is possible that it will create repeating data patterns, known as “collisions.” For example, the names “Tom” and “Peter” could both be masked as “Matt”. Because names and addresses naturally recur in real data, this mimics an actual data set. However, if you want the Masking Engine to mask all data into unique outputs, you should use Character Mapping.

Starting in version 6.0.4.0, we introduced a built in Extensible Secure Lookup Algorithm Framework. The new framework uses SHA256 hashing method and allows case configurations for input and output (i.e. masked) values.

Creating a secure lookup algorithm via UI

  1. At the top right of the Algorithms page, click + Algorithm.secure lookup

  2. Enter an Algorithm Name.

    Info: This MUST be unique.

  3. Enter a Description.

  4. Select Secure Lookup as the Framework Name and click Next.secure lookup

  5. Choose the Hash Method configuration for lookup determination.

    • SHA256 - This hash method is the default hash method for extensible secure lookup algorithms.

    • LEGACY - This hash method is used to mimic the legacy secure lookup behavior in the extensibility framework.

    • RANDOMIZE - This method replaces the input value with a random value from the lookup file. It internally uses a PRNG (pseudorandom number generator) to generate a random index with a uniform distribution (equal probability) which is subsequently used to fetch the value from the lookup file.

  6. Specify a Lookup File. This file is a single list of values that does not require a header, every line of the Lookup File might be used as a masked value. The Lookup File must be ASCII or UTF-8 encoding compatible. The lookup file can be referenced locally or with a specified/uploaded URI. The following is sample file content:

    Smallville
    Clarkville
    Farmville
    Townville
    Cityname
    Citytown
    Towneaster
  7. Choose the Case Sensitive Lookup configuration. If the Case Sensitive Lookup box is marked then the same input of different cases will be masked to the different values. For example:

    Peter -> John
    peter -> Andrew

    If that setting is not marked (which is a default option), then the lookup would be case insensitive, for example:

    Peter -> John
    peter -> John
  8. Choose the Trim Whitespace selections. The handling of whitespace should be carefully considered. By default, leading and trailing whitespace is preserved and will be passed back into the data around the new masked string. The following options are available:

    • Trim Whitespace in Lookup File: This assumes that any whitespace leading or trailing strings in your lookup file is erroneous and should be removed. If such whitespace is part of your valid data format, this option should not be selected. If not selected, whitespace will be output as part of the masked data.

    • Trim Whitespace from Inputs: This assumes that any whitespace in your unmasked source data is erroneous and should be removed. If such whitespace is part of your valid data format or you want leave erroneous whitespace in place to simulate realistic dirty data, this option can be left unselected.

Please note that the options selected could cause the following problems if not carefully considered:

  1. Job fails to mask table or file with the database or connector reporting a string length is longer than the length supported by the column or field.

    1. If Trim Whitespace from Input is not selected and the field is completely space padded to the maximum length, and the masked string is longer than the source unmasked string, the result could be longer than the maximum length allowed. This is more likely to occur when dealing with non-ASCII data, as field length reporting in relation to length in bytes versus characters is inconsistent among all databases and connectors.

  2. The job succeeds but referential integrity is broken at the application level. In rare cases, the definition of two fields that should have referential integrity, may have different maximum lengths. If whitespace is part of the valid data format, output whitespace could be trimmed to fit the shorter of the fields but fit untrimmed in another field.

    1. If such a possibility exists, users should select Trim Whitespace from Input, deselect Trim Whitespace in Lookup File, and provide any required whitespace via the lookup file up to, but not exceeding, the maximum length of the shortest field masked by the same algorithm.

  1. Choose the Output (Masked) Case configuration.

    1. Preseve Lookup File Case - keep the masked value as found in the Lookup File.

    2. Preserve Input Case - check the input case, which can be one of the following three:

      1. All uppercase - in that case, force the whole masked value to uppercase.

      2. All lowercase - in that case, force the whole marked value to lowercase.

      3. Mixed (if at least 1 character case is different from others) - in that case keep the masked value as found in the Lookup File

    3. Force all Uppercase - forces the whole masked value to uppercase

    4. Force all Lowercase - forces the whole masked value to lowercase

  2. Click Next to verify details on the Summary step.secure lookup

  3. When you are finished, click Save.

  • Since the RANDOMIZE hash method chooses a random replacement value every time, this does not maintain referential integrity and should never be used in cases where referential integrity is required. This is however useful in certain cases where masked values need to reflect a statistical distribution (ex: genders, model a population distribution, etc.)

  • Before using the algorithm in a profiling job, you must add it to a domain.

For information on creating Secure Lookup algorithms through the API, see API Calls for Creating Algorithms - Secure Lookup.