Azure Blob Storage with Java
Introduction
Azure Blob Storage is a storage mechanism with Azure Cloud. It helps to create data lake by providing storage solution which can be used for various type of applications.
It provides effective way to manage files and block storage components into Azure Cloud which can be used with for cloud-native and mobile applications. It easily provides integration with other Cloud components and create powerful workload automation solution.
Azure provides SDKs which provides APIs to manage Azure Blob Storage from different programming language e.g., Java, .Net, Python etc. Also, it supports HTTPs based mechanism for storage manipulation.
Azure offers various pricing and structures based upon different use cases.
Problem Statement
In application, at different levels file storage may require. In some cases, large file to be stored and operations like read, write, delete, archive also required.
Let us take a real-world problem: An application to be developed which scan documents, store scanned documents, convert to bitonal, do OCR and read textual content and send notification to a web application for further processing.
Consider the diagram for problem statement:
Along with the described piped operation many concerns arrive:
- Consider the backup and recovery of storage and files so that in case of storage failure no data loss happens
- The piped operation needs faster and more responsive so that user does not feels delayed response even if large files take participation into processing
- Archival/purging of stale data and files
Additionally, the cross-cutting concerns for any application also applies like,
- Auto scaling based upon load
- Logging and health metrics
- Authorization and authentication
And so on.
This article covers only storage part.
Solution
Considering the case discussed into the problem statement section, Azure Blob Storage is best option to store the files and create piped application.
To develop complete solution refer articles: Azure Function App with Java and Azure Event Grid with Real World Example
Below diagram depicts the solution diagram:
Azure Blob Storage is fully managed binary large object storage solutions and provides common operations on binary/file storages like,
- Create a file, read, write, and delete
- Backup and restore
- Archival files
- Soft delete of files
- Segregation of files into container and folders
- Tagging and metadata save along with file (applicable into specific region only)
- Authorization and authentication using azure account and connection string, Encryption, CORS, Shared Access Signature, Networking Configuration
- Data Migration to/from
- Events
- Resource insight — health monitoring and metrics and alerts
Azure Blob Storage is fully managed service so the common concerns like backup, recovery, file operations memory management is managed by Azure and no one needs worried much about. Normally, Azure Blob Storage keeps three copies of same file in region to recover in case if any fault domain failed, user does not feel any issue.
Terminologies Used
- Containers — High level container component which contains files and folders. Container cannot contain another container
- Folder — Logical directories to separate the files/folders. Folder may have folders or files.
- Block Blob/File — A file which referred with unique name in a container/folder. It behaves like directory tree of Windows/Linux where one directory may have n-number of files or directories with unique names
- Shared Access Signature (SAS) Token — An access token for any storage component e.g., container, file, folder etc.
- Microsoft Azure Storage Explorer — A tool to connect with Azure Blob Storage and manage from your machine.
Azure Blob Account — Azure Portal
To develop the solution first, need to create azure account.
Once azure account created, need to create Azure Blob Storage resource by following below steps:
1. Open Storage Accounts — from search resources or from menu of Azure Portal
2. Click on new, A form will be opened as,
Fill subscription details, then resource group. Provide Storage account name of your choice, select location, choose performance, and finally choose Account Kind and Replication
Once filled all fields, click on next to go network tab:
Choose connectivity method, if need to access from public internet choose Public endpoint. Select default network routing. Then click on next to open Data Protection tab:
This tab has recovery options and soft delete option, choose as per need, and click next to open Advance tab:
Choose Secure Transfer required Enabled, Allow shared key access Enabled, TLS version 1.2, Encryption Enable if required. Choose Blob Storage public access Disabled, Access tier Cool, NFS v3 if required, DataLake if required, Azure Files if required. Click on next button to open Tags Tab. Provide the required tags for the resource (optional) and then review and create.
Once, Azure Blob Storage created, open the storage account, and go to settings -> Access Keys. Copy the access key it will require further while accessing Blob Storage to perform operations.
Azure Blob Storage Client — Java Code
Once, Azure Blob Storage created, you will have Connection String.
Now, create scanner application and add the Java Utility for Azure Blob Storage from https://github.com/siddhivinayak-sk/azure-blob-storage.
Azure provides two APIs to manage the Azure Blob Storage from Java application:
- azure-storage-blob — It provides API which operates based upon HTTPs based REST calls for all operations on Azure Blob Storage.
Below is maven dependency:
<dependency>
<groupId>com.microsoft.azure</groupId>
<artifactId>azure-storage-blob</artifactId>
<version>11.0.0</version>
</dependency>
2. azure-storage — It also provides storage access with sophisticated communication and observed little faster than the other APIs provided by Azure Blob Storage
Below is maven dependency:
<dependency>
<groupId>com.microsoft.azure</groupId>
<artifactId>azure-storage-blob</artifactId>
<version>11.0.0</version>
</dependency>
Using both APIs, utility methods for managing Azure Blob Storage have been created for options like,
- Upload
- Download
- Delete
- Generate SAS URL
- List available files
Alternate Solution
There are alternate approaches available like using disks to store the files but if disk is used, the accessibility and cross-cutting concerns need to take care especially backup and recovery in case of failure.
Additionally, each Cloud Vendor provides similar kind of Blob Storage which are fully managed and do not need to worry about common concerns.
Conclusion
Fully managed Blob Storage services are getting popularity for storing data in cloud-native and mobile application because fully managed services provide most of the common features which are required for any application.
Java developers can easily integrate Blob Storage and manage the storage from backend as well as from front end code very easily.
Consider the design to work with azure in very secure manner:
The backend code will have all credential which can access the Blob Storage. They can generate SAS token/URL for the specified period where front end to upload/download the files. This does not require calls to backend and the file handling load will be distributed to front and Blob Storage and application will perform much faster.
References
- Microsoft Blob Storage — https://azure.microsoft.com/en-in/services/storage/blobs/
- Java Blob Utility — https://github.com/siddhivinayak-sk/azure-blob-storage
- Azure Function — https://siddhivinayak-sk.medium.com/azure-function-app-with-java-548db9447c31
- Azure Event Grid — https://siddhivinayak-sk.medium.com/azure-event-grid-with-real-world-example-4b1a541b03d8
About the Author
Sandeep Kumar holds Master of Computer Application degree has been Java developer having 10 years of working experience. He has experience designing and development of enterprises applications into various education, content, laboratory, and banking domains, got various appreciation for providing solutions including spot appreciation for Glassfish to JBoss migration project. He secured Google Cloud Developer certificate and participated into OCI training. He is a part of HCL-ERS platform as Sr. Lead developer.