File size: 4,119 Bytes
593a0d0 47e54ca 593a0d0 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 |
---
license: apache-2.0
---
## Ishara: ASL Fingerspelling Recognition
Ishara is a deep learning model designed for accurate recognition of American Sign Language (ASL) fingerspelling. It is based on a hybrid architecture that combines **Squeezeformer** and **Conformer** blocks with **Conv1D layers** for efficient feature extraction from hand, face, and pose landmark data.
This model is a submission to the Google ASLFR Competition and achieves robust performance on character-level prediction tasks.
---
## Model Description
Ishara processes sequences of normalized hand, face, and pose landmarks to predict fingerspelled words at the character level. The architecture is designed to handle temporal variability and missing data using a combination of:
- **Squeezeformer blocks**: For efficient sequence modeling.
- **Conformer blocks**: For enhanced feature extraction.
- **Conv1D layers**: For initial temporal feature extraction.
The output predictions are character-level sequences optimized using **Connectionist Temporal Classification (CTC)** loss.
---
## Dataset
The model was trained and evaluated on the dataset provided by the [Google ASLFR Competition](https://www.kaggle.com/competitions/asl-fingerspelling), which consists of:
- **Hand landmarks**: 21 points each for left and right hands.
- **Face landmarks**: 40 key points.
- **Pose landmarks**: 10 key points.
- **Labels**: Text sequences representing fingerspelled words.
---
## Usage
### Inference with TFLite
The model is available in TensorFlow Lite format for real-time inference. To use the model:
```python
import tensorflow as tf
# Load the TFLite model
interpreter = tf.lite.Interpreter("model.tflite")
interpreter.allocate_tensors()
# Define input-output
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
# Input a sequence of landmarks
input_data = ... # Preprocessed input sequence
interpreter.set_tensor(input_details[0]['index'], input_data)
interpreter.invoke()
# Get the prediction
output_data = interpreter.get_tensor(output_details[0]['index'])
print("Predicted Sequence:", output_data)
```
---
### Training Workflow
You can replicate the training process using TensorFlow. The training loop is as follows:
```python
from model import get_model
# Define the model
model = get_model(
dim=256,
num_conv_squeeze_blocks=2,
num_conv_conform_blocks=2,
kernel_sizes=[11, 5, 3],
num_conv_per_block=3,
dropout_rate=0.2
)
# Train the model
history = model.fit(
train_dataset,
validation_data=val_dataset,
epochs=N_EPOCHS,
callbacks=[validation_callback, lr_callback, WeightDecayCallback()]
)
```
---
## Model Evaluation
The model's performance is evaluated using:
- **Levenshtein Distance**: Measures character-level accuracy.
- **Normalized Character Error Rate (CER)**: Quantifies the model's robustness.
- **Real-Time Inference Speed**: Assessed on 1080p video inputs.
---
## Results
- **Normalised Levenshtein Distance**: [0.728]
- **Inference Speed**: [200ms]
- **Model Size**: [17.9 Mb]
---
## Deployment
The model is optimized for deployment in real-time systems using TensorFlow Lite. This makes it suitable for integration into mobile and embedded systems for ASL recognition tasks.
---
## License
This model is released under the [Apache License 2.0](http://www.apache.org/licenses/LICENSE-2.0).
---
## Acknowledgments
- **Google ASLFR Competition**: For providing the dataset.
- **TensorFlow Team**: For the deep learning framework.
- **Paper Authors**: For inspiring the architecture.
- [Squeezeformer](https://arxiv.org/abs/2206.00888)
- [Conformer](https://arxiv.org/abs/2005.08100)
---
## Citation
If you use this model, please consider citing:
```
@misc{ishara_asl,
title={Ishara: ASL Fingerspelling Recognition},
author={Niharika Gupta, Tanay Srinivasa, Tanmay Nanda, Zoya Ghoshal},
year={2024},
howpublished={\url{https://huggingface.co/ishara-asl}}
}
```
---
## Contact
For questions or collaboration, feel free to reach out:
- **Tanmay Nanda**: [email protected] |