This was a single institution, multicenter study. We show our results on a diagnostic mammography dataset, a screening mammography dataset, and a curated dataset with small cancers.
To train our network, we collected a dataset from our hospital consisting of full-field digital mammograms (FFDMs) acquired on Selenia Dimensions, a Hologic mammography unit from January 2015 to December 2015. In order to create a suitable balanced dataset at training, we collected consecutive patients who had been classified as BIRADS category 4 and had a histopathological diagnosis. So, this dataset consisted of 839 images, with 393 cancers. Images of the contralateral breast provided normal examples to the network.
Diagnostic Mammography Dataset
Our country does not have a formal mammography screening program. Thus, our test dataset (acquired from the same center as the training dataset) had a distribution of diagnostic mammography practice. For this dataset, consecutive patients who underwent mammography from January 2018 to June 2018 were chosen. Patients who had received a BIRADS 4 or higher but who had no histopathological report were excluded from the study. There were 2569 images with 243 cancers in this dataset. This is referred to as the DM dataset in further discussion.
Screening Mammography Data Set
This is an external dataset obtained from our Cancer Center, where opportunistic screening is offered to all eligible women. All patients who underwent mammography from January to April 2021 were selected, except those without histological evidence of BIRADS 4 or greater lesions. This dataset provided an external testing dataset and also helped determine our effectiveness in a screening context. These images were acquired on a Hologic system. There were 2146 images with 59 cancers, and this is referred to as the SM dataset in further discussion.
Small Cancer Dataset
In order to establish the value of our network in small cancers, we organized a dataset of patients with cancers less than 1 cm (the diameter of the mass was used for the masses and the longest dimension of the cluster was used for a microcalcification cluster). There were 79 images in this dataset with an average cancer size of 5.8 mm in diameter. This is called the SC dataset. The images in this dataset come from the two centers above.
All datasets were collected after obtaining ethical clearance from Institutional Review Board (IRB) of All India Institute of Medical Sciences with reference number IEC-247/04.05.2018. These data were anonymized, informed consent was obtained for the use of data from all patients participating in the study. All experimental protocols were approved by IRB, Indian Institute of Medical Sciences, New Delhi, and all methods were performed in accordance with relevant guidelines and regulations. Bounding box annotations were performed by 3 breast radiologists with 2, 8, and 15 years of experience in breast imaging. All images were 3328×4096 pixels in size.
All input images were 3328×4096 in size. These were initially cropped to remove the part of the mammogram that did not contain a breast. These images were therefore of variable size (depending on the size of the breast) on one dimension and 4096 pixels on the other dimension. These were then transmitted into the network.
Our proposed architecture is shown in Fig. 2. The network involved the following steps (1) Generation of multiple scales (2) Systematic cropping of images (3) Passing through the basic architecture (4) Combination at test time.
Generation of several scales
The full resolution image is scaled to give images at 3 scales – X, 0.5X and 0.25X, where X is the original image.
Crops of size 0.25 times the original image are taken from the 3 scales. These crops were 1024 pixels on one dimension, and variable in size on the other dimension. These cultures constitute the input of the network. Crops are systematically taken from the larger images from right to left and top to bottom ensuring that no part of the image is overlooked.
We chose YOLO v5 (You Only Look Once version 5)19 as a base architecture because it is fully convolutional and thus allows us to transmit images of all input sizes. Our ablation study to choose the base network is presented in the results section. We used YOLO v5 with a CSPDarknet backbone, a PANet neck with oversampling and concatenations of different layers and a final convolution of the YOLO head. The CSPDarknet is used for feature extraction, PANet for feature fusion, and the YOLO layer for calculating class and objectivity scores. Concatenating feature vectors at different layers within the back-end architecture takes advantage of varying contextual information available at different layers. Each layer therefore corresponds to increasingly large receptive fields, which contributes to detecting masses of very small sizes.
Output combination at time of test
At the time of testing, full images were provided as input in 3 scales – X, 0.5X and 0.25X. Predictions were generated for each scale separately. These predictions were finally combined using Weighted Box Fusion20(WBF) as described by Wang et al. WBF uses the confidence scores from each of the models and then combines them to construct an “average” bounding box that captures the underlying ground truth boxes better than any of the models’ individual predictions. We experimented with simple nonmaximal suppression (NMS) and NMS with thresholds and found WBF to perform best.
As in the default implementation of yolov5, we used binary cross-entropy with loss of logits for the calculation of object scores and SGD as an optimizer. We kept the batch size at 16 and the initial learning rate at 0.01. All calculations were performed on a high-performance computing cluster with 32GB V100 GPUs.