Share via


Azure Health Data Services de-identification service client library for Java - version 1.0.0

This package contains a client library for the de-identification service in Azure Health Data Services which enables users to tag, redact, or surrogate health data containing Protected Health Information (PHI). For more on service functionality and important usage considerations, see the de-identification service overview.

Getting started

Prerequisites

Adding the package to your product

<dependency>
    <groupId>com.azure</groupId>
    <artifactId>azure-health-deidentification</artifactId>
    <version>1.0.0</version>
</dependency>

Authentication

Both the asynchronous and synchronous clients can be created by using DeidentificationClientBuilder. Invoking buildClient will create the synchronous client, while invoking buildAsyncClient will create its asynchronous counterpart.

You will need a service URL to instantiate a client object. You can find the service URL for a particular resource in the Azure portal, or using the Azure CLI:

# Get the service URL for the resource
az deidservice show --name "<resource-name>" --resource-group "<resource-group-name>" --query "properties.serviceUrl"

Optionally, save the service URL as an environment variable named DEID_ENDPOINT for the sample client initialization code.

The Azure Identity package provides the default implementation for authenticating the client. You can use DefaultAzureCredential to automatically find the best credential to use at runtime.

DeidentificationClient deidentificationClient = new DeidentificationClientBuilder()
    .endpoint(Configuration.getGlobalConfiguration().get("DEID_ENDPOINT"))
    .credential(new DefaultAzureCredentialBuilder().build())
    .buildClient();

Key concepts

De-identification operations:

Given an input text, the de-identification service can perform three main operations:

  • Tag returns the category and location within the text of detected PHI entities.
  • Redact returns output text where detected PHI entities are replaced with placeholder text. For example John replaced with [name].
  • Surrogate returns output text where detected PHI entities are replaced with realistic replacement values. For example, My name is John Smith could become My name is Tom Jones.

Available endpoints

There are two ways to interact with the de-identification service. You can send text directly, or you can create jobs to de-identify documents in Azure Storage.

You can de-identify text directly using the DeidentificationClient:

String inputText = "Hello, my name is John Smith.";

DeidentificationContent content = new DeidentificationContent(inputText);
content.setOperationType(DeidentificationOperationType.SURROGATE);

DeidentificationResult result = deidentificationClient.deidentifyText(content);
System.out.println("De-identified output: " + (result != null ? result.getOutputText() : null));
// De-identified output: Hello, my name is <synthetic name>.

To de-identify documents in Azure Storage, see Tutorial: Configure Azure Storage to de-identify documents for prerequisites and configuration options. In the sample code below, populate the STORAGE_ACCOUNT_NAME and STORAGE_CONTAINER_NAME environment variables with your desired values. To refer to the same job between multiple examples, set the DEID_JOB_NAME environment variable.

The client exposes a beginDeidentifyDocuments method that returns a SyncPoller or PollerFlux instance. Callers should wait for the operation to be completed by calling getFinalResult():

String storageLocation = "https://" + Configuration.getGlobalConfiguration().get("STORAGE_ACCOUNT_NAME") + ".blob.core.windows.net/" + Configuration.getGlobalConfiguration().get("STORAGE_CONTAINER_NAME");
DeidentificationJob job = new DeidentificationJob(
    new SourceStorageLocation(storageLocation, "data/example_patient_1"),
    new TargetStorageLocation(storageLocation, "_output")
        .setOverwrite(true)
);

job.setOperationType(DeidentificationOperationType.REDACT);

String jobName = Configuration.getGlobalConfiguration().get("DEID_JOB_NAME", "MyJob-" + Instant.now().toEpochMilli());
DeidentificationJob result = deidentificationClient.beginDeidentifyDocuments(jobName, job)
    .waitForCompletion()
    .getValue();
System.out.println(jobName + " - " + result.getStatus());

Examples

The following sections provide several code snippets covering some of the most common client use cases, including:

Create a DeidentificationClient

DeidentificationClient deidentificationClient = new DeidentificationClientBuilder()
    .endpoint(Configuration.getGlobalConfiguration().get("DEID_ENDPOINT"))
    .credential(new DefaultAzureCredentialBuilder().build())
    .buildClient();

De-identify text

String inputText = "Hello, my name is John Smith.";

DeidentificationContent content = new DeidentificationContent(inputText);
content.setOperationType(DeidentificationOperationType.SURROGATE);

DeidentificationResult result = deidentificationClient.deidentifyText(content);
System.out.println("De-identified output: " + (result != null ? result.getOutputText() : null));
// De-identified output: Hello, my name is <synthetic name>.

Begin a job to de-identify documents in Azure Storage

String storageLocation = "https://" + Configuration.getGlobalConfiguration().get("STORAGE_ACCOUNT_NAME") + ".blob.core.windows.net/" + Configuration.getGlobalConfiguration().get("STORAGE_CONTAINER_NAME");
DeidentificationJob job = new DeidentificationJob(
    new SourceStorageLocation(storageLocation, "data/example_patient_1"),
    new TargetStorageLocation(storageLocation, "_output")
        .setOverwrite(true)
);

job.setOperationType(DeidentificationOperationType.REDACT);

String jobName = Configuration.getGlobalConfiguration().get("DEID_JOB_NAME", "MyJob-" + Instant.now().toEpochMilli());
DeidentificationJob result = deidentificationClient.beginDeidentifyDocuments(jobName, job)
    .waitForCompletion()
    .getValue();
System.out.println(jobName + " - " + result.getStatus());

Get the status of a de-identification job

String jobName = Configuration.getGlobalConfiguration().get("DEID_JOB_NAME");
DeidentificationJob result = deidentificationClient.getJob(jobName);
System.out.println(jobName + " - " + result.getStatus());

List all de-identification jobs

PagedIterable<DeidentificationJob> result = deidentificationClient.listJobs();
for (DeidentificationJob job : result) {
    System.out.println(job.getJobName() + " - " + job.getStatus());
}

List all documents in a de-identification job

String jobName = Configuration.getGlobalConfiguration().get("DEID_JOB_NAME");
PagedIterable<DeidentificationDocumentDetails> result = deidentificationClient.listJobDocuments(jobName);
for (DeidentificationDocumentDetails documentDetails : result) {
    System.out.println(documentDetails.getInputLocation().getLocation() + " - " + documentDetails.getStatus());
}

Troubleshooting

A DeidentificationClient raises HttpResponseException exceptions. For example, if you provide an invalid service URL an HttpResponseException would be raised with an error indicating the failure cause. In the following code snippet, the error is handled gracefully by catching the exception and display the additional information about the error.

try {
    DeidentificationContent content = new DeidentificationContent("input text");
    deidentificationClient.deidentifyText(content);
} catch (HttpResponseException e) {
    System.out.println(e.getMessage());
    // Do something with the exception
}

Next steps

See the [samples] for several code snippets illustrating common patterns used in the de-identification service Java SDK. For more extensive documentation, see the de-identification service documentation.

Contributing

For details on contributing to this repository, see the contributing guide.