Research Architectures

Explore the custom machine learning models and neural networks driving Ishaara's real-time Indian Sign Language detection.

GRU Sequence To Sequence Visualization
Model V1

MediaPipe Landmark Architecture

GRU Sequence To Sequence

Leveraging MediaPipe's powerful landmark detection capabilities, this model achieves an impressive 99% testing accuracy across 6 distinct actions. The model supports basic conversational signs including 'hello', 'how are you', 'sorry', 'welcome', and 'thank you'.

Landmark extraction from student video data
Ready-to-deploy model implementation
Depth detection capabilities
ChatGPT integration for sentence creation
Automated real-time sign processing
99% Accuracy
6 Action Classes
Real-time Processing
ConvLSTM Architecture Visualization
Model V2

Hybrid Neural Network

ConvLSTM Architecture

A sophisticated hybrid approach combining Convolutional layers with LSTM cells for advanced video classification. This model represents a perfect balance between spatial and temporal feature extraction, trained extensively on our custom ISL dataset.

Hybrid CNN-LSTM architecture
Comprehensive video classification
CPU-optimized training process
Efficient lightweight model storage
Low-latency prediction capabilities
8 Hrs Training
Optimized Build
Edge Inference
Conv3D Video Classification Visualization
Model V3

Spatio-Temporal Feature Learning

Conv3D Video Classification

An ambitious implementation utilizing 3D convolutions for comprehensive spatio-temporal feature learning. While currently limited by hardware constraints, this model represents our ultimate vision for future high-fidelity ISL development.

Requires 32GB+ System RAM
High-performance GPU Mandatory
Powerful Multi-core CPU Required
Massive 3GB Baseline Model Size
Deep Spatio-Temporal Extraction
3GB File Size
Heavy Compute
GPU Optimized