Skip to content

Mammogram Deidentification

deidcm.dicom.deid_mammogram

This module is a mammograms deidentification toolbox.

This module contains functions related to deidentification of mammograms. It fulfills the following purposes:

  • deidentifying mammogram's images
  • deidentifying mammogram's metadata
Deidentification Functionalities
Image Deidentification based on OCR
Attributes/Metadata Deidentification based on a Recipe

Image Deidentification

deidentify_image_png(infile, outdir, filename)

Deidentify and write a given mammogram's image in outdir as filename.png

This function invokes the OCR reader for getting all potential words on a mammogram's image. Then, it hides all found words by higlighting them in black.

Parameters:

Name Type Description Default
infile str

The path of the DICOM file to deidentify.

required
outdir str

The path of the directory that will store the output.

required
filename str

The name of the resulting PNG file. (don't add the file extension).

required
get_PIL_image(dataset)

Get Image object from Python Imaging Library(PIL)

Get the image from the pydicom dataset and convert it from a numpy.ndarray to a PIL image object. If available, the function will use metadata information contained inside the pydicom dataset for the conversion.

Parameters:

Name Type Description Default
dataset Dataset

A pydicom dataset which can be obtained from a DICOM file.

required

Returns:

Name Type Description
Image Image

A PIL image object.

Example
get_PIL_image.py
1
2
3
4
5
6
from deidcm_deid.dicom.deid_mammogram import get_PIL_image
import pydicom

ds = pydicom.read_file("my-mammogram.dcm")
img = get_PIL_image(ds)
img.show()
get_text_areas(pixels, languages=['fr'])

Read and return words of an image.

This function takes a pixel array in input and submits it to the easyOCR Reader. This Reader will then return a list of found words. This function implicitly remove authorized words from the computed list.

Parameters:

Name Type Description Default
pixels ndarray

An array representing an image.

required
languages list

A list of supported languages for the OCR Reader. This allows to submit images with text written in different languages.

['fr']

Returns:

Name Type Description
list list

A list of words detected on the submitted image.

Info

The list of available languages can be found here.

remove_authorized_words_from(ocr_data)

Remove authorized words from ocr_data list

This function allows to remove authorized words from easyOCR output. It is useful if you want to keep some text information on your image such as image laterality information (RMLO, LCC, OBLIQUE G...).

Parameters:

Name Type Description Default
ocr_data list

A list of words and coordinates obtained after submitting an image to easyOCR Reader.

required

Returns:

Type Description
list

The same list of words and coordinates minus the authorized words elements.

Info

For more information on how to define your own list of authorized words, go to Customize Deidentification Tasks

hide_text(pixels, ocr_data, color_value='black', mode='rectangle', margin=300)

Censor text present on the pixels array representing an image.

Take the input image and draw new shapes with PIL package in order to censor OCR-detected words.

Parameters:

Name Type Description Default
pixels ndarray

A pixels array representing an image

required
ocr_data list

A list of words and coordinates obtained by easyOCR Reader after submitting an image.

required
color_value str

A string indicating the color of the rectangle used for censoring information (white or black)

'black'
mode str

A string indicating the method for censoring information. (blur or rectangle)

'rectangle'

Returns:

Type Description
ndarray

The deidentified pixels array.

Attributes Deidentification

deidentify_attributes(indir, outdir, org_root, erase_outdir=True)

Produce a Pandas dataframe with deidentified information from a folder of DICOM files.

This function creates a Pandas dataframe from all files present in the indir folder. Then, it loads the deidentification recipe and iterates through the dataframe to deidentify its content. Finally, it returns the deidentified dataframe object.

It also takes outdir and erase_outdir arguments for handling output directory auto-cleaning in the context of a data pipeline. If you're not interested in auto-cleaning your output repository, simply specify outdir and set erase_outdir to False.

Parameters:

Name Type Description Default
indir str

The input directory (DICOM files to deidentify)

required
outdir str

The output directory (deidentified/resulting files)

required
org_root str

An organization root identifier for deidentifying DICOM UIDs.

required
erase_outdir bool

Empty the output directory if True

True

Returns:

Type Description
DataFrame

A Pandas dataframe containing all metadata/attributes information.

Info

org_root refers to a prefix used for deidentifying DICOM UIDs. This prefix has to be unique for your organization.

For more information, see NEMA DICOM Standards Documentation.

Example

Let's test our recipe by adding one of its attribute into a pydicom dataset. The attribute in our recipe looks like this:

"0x00209161": [
    "ConcatenationUID",
    "UI",
    "PSEUDONYMISER"
],

Step n°1: We add the new DICOM UID to our pydicom dataset

1
2
3
4
5
import pydicom

ds = pydicom.read_file("my-mammogram.dcm")
ds.add_new("0x00209161", "UI", "1.123.123.1234.123456.12345678")
ds.save_as("my-modified-mammogram.dcm")

It will then appear inside your pydicom dataset:

(0020, 9161) Concatenation UID                   UI: 1.123.123.1234.123456.12345678

Step n°2: We deidentify the folder containing our test mammogram

1
2
3
4
from deidcm_deid.dicom.deid_mammogram import deidentify_attributes

df = deidentify_attributes("/path/to/mammogram/folder", "/path/to/outdir", org_root="9.9.9.9.9", erase_outdir=False)
print(df.ConcatenationUID_0x00209161_UI_1____)
9.9.9.9.9.474079559915109435636573090782

get_general_rule(tag, recipe)

Get the rule associated with the given tag in recipe.json

Parameters:

Name Type Description Default
tag str

A DICOM tag

required
recipe dict

A Python dictionary containing recipe elements. See load_recipe()

required

Returns:

Type Description
str

The action associated to this DICOM tag in the provided recipe. It can be anything among deidentification actions (CONSERVER, RETIRER EFFACER, PSEUDONYMISER)

Note

This function is implicitly called by deidentify_attributes each time it needs to take a deidentification action.

Warning

This function takes a zero trust approach when encountering unknown tags and will always return RETIRER (= REMOVE) for all tags not found inside the recipe.

Example

Example n°1: Retrieve a rule for a tag inside the recipe

get_general_rule_for_known_tag.py
1
2
3
4
from deidcm_deid.dicom.deid_mammogram import load_recipe, get_general_rule

recipe = load_recipe()
rule = get_general_rule("0x00020000", recipe)
CONSERVER

Example n°2: Retrieve a rule for a tag that is not declared inside the recipe

get_general_rule_for_unknown_tag.py
1
2
3
4
from deidcm_deid.dicom.deid_mammogram import load_recipe, get_general_rule

recipe = load_recipe()
rule = get_general_rule("0x00026666", recipe)
RETIRER

get_specific_rule(tags, recipe)

Extract the specific rule from a list of tags in recipe.json if there is one.

Parameters:

Name Type Description Default
tags List[str]

A list of DICOM tags. The parent attribute is always before the child attribute. For instance, if we take ['AAA', 'BBB', 'CCC'], 'AAA' is a sequence containing 'BBB' and 'BBB' is a sequence containing the attribute 'CCC'.

required
recipe dict

A Python dictionary containing recipe elements. See load_recipe()

required

Returns:

Type Description
str

The action associated to this DICOM tag in the provided recipe. Same values as get_general_rules. It can also return None if no specific rules are defined for tags inside the list.

Customize Deidentification Tasks

deidcm.config.Config

This class is used to change the configuration of your environment.

This singleton object has to be instanciated for deidentification tasks. It allows you to define the path to a custom recipe and the path to a authorized_words.txt file.

  • recipe.json: a JSON file that contains the recipe orchestrating the attribute deidentification process.
  • authorized_words.txt: a TXT file that contains one word per line. Each word will be kept on the image even if it is detected by the OCR reader.
__new__(recipe_path=None, authorized_words_path=None)

Create a new instance of Config if it does not exist.

Parameters:

Name Type Description Default
recipe_path str

the path of your custom recipe.json file.

None
authorized_words_path str

the path of your custom authorized_words.txt file

None

Returns:

Name Type Description
Config Self

The single instance of the Config class.

Example

Default Configuration (inbuilt recipe, no authorized words)

default_config.py
1
2
3
4
from deidcm.config import Config

config = Config()
print(config.recipe)

Custom Configuration

custom_config.py
1
2
3
4
from deidcm.config import Config

config = Config(recipe_path="/path/to/custom-recipe.json", authorized_words_path="/path/to/authorized_words.txt")
print(config.recipe)

deidcm.config.Config.load_recipe(recipe_filepath) classmethod

Get the recipe from recipe.json and load it into a python dict.

This function reads recipe.json. If a user-defined version of the file is detected, it will be used. Otherwise, the inbuilt version of the file will be used.

Be aware that the inbuilt version of the file does not suit a generic usage. It was created for the Deep.piste study. It is highly recommended to create your own version of recipe.json.

Returns:

Type Description
dict

A Python dictionary with recipe elements.

Note

You don't have to call this function as it already implicitly when you instanciate the Config object.

Tip

This function can be called to check if your customized recipe is correctly detected by deidcm.

Example
example_load_recipe.py
1
2
3
4
from deidcm.config import Config

config = Config(recipe_path="/path/to/custom-recipe.json", authorized_words_path="/path/to/authorized_words.txt")
print(config.recipe)
{'0x00020000': ['FileMetaInformationGroupLength', 'UL', 'CONSERVER'], '0x00020001': ['FileMetaInformationVersion', 'OB', 'CONSERVER']}
deidcm.config.Config.load_authorized_words(authorized_words_filepath) classmethod

Get and load the list of authorized words from authorized_words.json

This function reads authorized_words.txt and load it into a python list. If the file is not defined, the deidentification process will erase all detected words.

Returns:

Type Description
list

A Python list of authorized words

Note

You don't have to call this function as it already implicitly when you instanciate the Config object.

Tip

This function can be called to check if your customized list of authorized words is correctly detected by deidcm.

Example
example_load_recipe.py
1
2
3
4
from deidcm.config import Config

config = Config(recipe_path="/path/to/custom-recipe.json", authorized_words_path="/path/to/authorized_words.txt")
print(config.authorized_words)
['HELLO', 'ALTER', 'DSQLD', 'SHOCR']