My Subject Matter
artificial-intelligence

Federated Learning New

A sourced reference on Federated Learning.

What is federated learning?

Federated learning is a machine learning approach where a model is trained across multiple decentralized devices or servers holding local data samples, without exchanging the raw data itself. Introduced by Google in 2017, it enables collaborative model training while keeping sensitive data on-device. [Source: Google Research]

Sources
Communication-Efficient Learning of Deep Networks from Decentralized Data
academic · arXiv (Google Brain / Carnegie Mellon University) · 2017-02-17
·

How does federated learning work step by step?

A central server distributes a global model to participating clients. Each client trains locally on its own data and sends only model updates (gradients or weights) back to the server. The server aggregates these updates using algorithms like FedAvg to improve the global model without accessing raw data. [Source: Google AI Blog]

Sources
Communication-Efficient Learning of Deep Networks from Decentralized Data
academic · arXiv (Google Brain / Carnegie Mellon University) · 2017-02-17
·

What is the FedAvg algorithm in federated learning?

FedAvg (Federated Averaging) is the foundational aggregation algorithm for federated learning, proposed by McMahan et al. in 2017. It averages the model weights from multiple clients, weighted by their local dataset sizes, to produce an updated global model. It reduces communication rounds significantly compared to naive gradient averaging. [Source: arXiv / Google Research]

Sources
Communication-Efficient Learning of Deep Networks from Decentralized Data
academic · arXiv (Google Brain / Carnegie Mellon University) · 2017-02-17
·

How is federated learning different from traditional centralized machine learning?

In traditional centralized ML, all training data is collected on a single server. Federated learning keeps data on local devices and shares only model updates. This reduces privacy risks, bandwidth consumption, and regulatory compliance barriers, but introduces challenges like statistical heterogeneity and communication overhead. [Source: IEEE]

Sources
Federated Machine Learning: Concept and Applications
academic · IEEE Transactions on Intelligent Systems and Technology · 2019-01-01
·
Communication-Efficient Learning of Deep Networks from Decentralized Data
academic · arXiv (Google Brain / Carnegie Mellon University) · 2017-02-17
·

What are the main types of federated learning?

Federated learning has three primary types: horizontal (same features, different users), vertical (different features, same users), and federated transfer learning (different features and users). Horizontal FL is most common in mobile applications; vertical FL suits cross-organizational scenarios like banking and healthcare. [Source: IEEE]

Sources
Federated Machine Learning: Concept and Applications
academic · IEEE Transactions on Intelligent Systems and Technology · 2019-01-01
·
Federated Machine Learning: Concept and Applications
academic · ACM Transactions on Intelligent Systems and Technology · 2019-02-28
·

What are the main challenges of federated learning?

Key challenges include statistical heterogeneity (non-IID data across clients), system heterogeneity (varying device capabilities), communication efficiency, and security vulnerabilities such as model poisoning and inference attacks. The NIST AI Risk Management Framework and academic literature consistently identify these as primary barriers to production deployment. [Source: NIST]

Sources
AI Risk Management Framework (AI RMF 1.0)
primary · National Institute of Standards and Technology (NIST) · 2023-01-26
·
Advances and Open Problems in Federated Learning
academic · arXiv (Google Research et al.) · 2021-03-09
·

How does federated learning improve data privacy?

Federated learning minimizes privacy risk by never centralizing raw data, reducing exposure under regulations like GDPR and HIPAA. The European Data Protection Board recognizes on-device processing as a privacy-enhancing technology. Combined with differential privacy or secure aggregation, it can provide formal privacy guarantees. [Source: European Data Protection Board]

Sources
Guidelines 4/2019 on Article 25 Data Protection by Design and by Default
official · European Data Protection Board · 2020-10-20
·

What role does differential privacy play in federated learning?

Differential privacy (DP) adds calibrated mathematical noise to model updates before they leave a client device, preventing the server from inferring individual training examples. Google's open-source TensorFlow Privacy and Apple's on-device ML systems both implement DP-FL to bound per-user privacy loss with a formal epsilon guarantee. [Source: Google Research / Apple]

Sources
·
Learning with Privacy at Scale
official · Apple Machine Learning Research · 2017-12-01
·

What is secure aggregation in federated learning?

Secure aggregation is a cryptographic protocol that allows a server to compute the sum of client model updates without seeing any individual update. Bonawitz et al. (2017) at Google introduced the practical protocol used in production, combining secret sharing and masking so individual gradients remain private even from the aggregating server. [Source: Google Research / ACM CCS]

Sources
Practical Secure Aggregation for Privacy-Preserving Machine Learning
academic · ACM CCS 2017 (Google Research) · 2017-10-30
·

What security threats exist in federated learning systems?

Primary threats include model poisoning attacks (malicious clients submitting manipulated updates), gradient inversion attacks (reconstructing training data from gradients), and free-rider attacks. NIST's AI Risk Management Framework and peer-reviewed research identify Byzantine-robust aggregation, anomaly detection, and differential privacy as key mitigation strategies. [Source: NIST]

Sources
AI Risk Management Framework (AI RMF 1.0)
primary · National Institute of Standards and Technology (NIST) · 2023-01-26
·
Advances and Open Problems in Federated Learning
academic · arXiv (Google Research et al.) · 2021-03-09
·

How is federated learning used in healthcare?

Federated learning enables hospitals to jointly train diagnostic AI models without sharing patient records, directly addressing HIPAA constraints. The NIH-funded FeTS (Federated Tumor Segmentation) initiative and Intel's collaboration with 29 international hospitals demonstrated FL-trained brain tumor models that matched or exceeded centrally trained baselines. [Source: Nature Medicine]

Sources
·
The Federated Tumor Segmentation (FeTS) Challenge
primary · National Institutes of Health / PubMed Central · 2022-02-15
·

What are the most common real-world use cases for federated learning?

Leading production use cases include Google's Gboard next-word prediction, Apple's Siri voice model improvements, financial fraud detection across banks, and cross-hospital medical imaging. The European Commission's AI Act explicitly supports privacy-preserving techniques like FL for high-risk AI applications. [Source: European Commission / Google AI]

Sources
Regulation (EU) 2024/1689 — Artificial Intelligence Act
primary · European Commission / Official Journal of the European Union · 2024-07-12
·

How does federated learning work on mobile devices?

On mobile devices, training occurs locally when the device is idle, charging, and on Wi-Fi. Google's production FL system for Android trains Gboard models on millions of devices, aggregates updates server-side using FedAvg and secure aggregation, then pushes improved models back—all without user data leaving the device. [Source: Google AI Blog]

Sources
Practical Secure Aggregation for Privacy-Preserving Machine Learning
academic · ACM CCS 2017 (Google Research) · 2017-10-30
·

How is communication efficiency addressed in federated learning?

Communication is the primary bottleneck in FL because sending full model gradients is expensive. Techniques include gradient compression, quantization, sparsification, and client subsampling. Research from Google and CMU shows structured updates and sketched updates can reduce communication costs by 10–100× with minimal accuracy loss. [Source: arXiv / Google Research]

Sources
Federated Learning: Strategies for Improving Communication Efficiency
academic · arXiv (Google Research) · 2016-10-08
·
Advances and Open Problems in Federated Learning
academic · arXiv (Google Research et al.) · 2021-03-09
·

Does federated learning help with GDPR compliance?

Federated learning supports GDPR compliance by applying data minimization (Article 5) and privacy by design (Article 25), since raw personal data never leaves the user's device. The European Data Protection Board recognizes privacy-enhancing technologies including on-device processing as valid measures, though anonymization of model updates must also be ensured. [Source: European Data Protection Board]

Sources
Guidelines 4/2019 on Article 25 Data Protection by Design and by Default
official · European Data Protection Board · 2020-10-20
·
Regulation (EU) 2024/1689 — Artificial Intelligence Act
primary · European Commission / Official Journal of the European Union · 2024-07-12
·

What open-source frameworks are available for federated learning?

Major open-source FL frameworks include Google's TensorFlow Federated (TFF), Meta's FedML, OpenFL by Intel, PySyft by OpenMined, and NVIDIA FLARE for healthcare. IEEE and ACM benchmarks show TFF and NVIDIA FLARE as most mature for production; PySyft emphasizes cryptographic privacy guarantees. [Source: IEEE / NVIDIA]

Sources
NVIDIA FLARE: Federated Learning Application Runtime Environment
official · NVIDIA Corporation · 2022-03-15
·
TensorFlow Federated: Machine Learning on Decentralized Data
official · Google / TensorFlow · 2019-03-01
·

What is the difference between cross-silo and cross-device federated learning?

Cross-device FL involves millions of mobile or IoT devices with intermittent connectivity and limited compute (e.g., Google Gboard). Cross-silo FL involves a small number of reliable organizations—such as hospitals or banks—with stable connections and larger local datasets. Each setting demands different aggregation strategies and fault-tolerance mechanisms. [Source: arXiv / Google Research]

Sources
Advances and Open Problems in Federated Learning
academic · arXiv (Google Research et al.) · 2021-03-09
·

What is the non-IID data problem in federated learning?

Non-IID (non-independent and identically distributed) data means each client's local dataset reflects only their own behavior, creating statistical heterogeneity across clients. This causes FedAvg to diverge or underperform compared to centralized training. Techniques like FedProx, SCAFFOLD, and personalized FL were specifically designed to address this challenge. [Source: arXiv / Carnegie Mellon University]

Sources
Federated Optimization in Heterogeneous Networks (FedProx)
academic · arXiv (Carnegie Mellon University / Google) · 2020-01-29
·
Advances and Open Problems in Federated Learning
academic · arXiv (Google Research et al.) · 2021-03-09
·

What is personalized federated learning?

Personalized federated learning (pFL) adapts the global model to each client's local data distribution rather than returning a single shared model. Approaches include local fine-tuning, meta-learning (e.g., Per-FedAvg), mixture models, and federated multi-task learning. Research from Stanford and CMU shows pFL significantly outperforms standard FL under non-IID conditions. [Source: arXiv / Stanford]

Sources
Advances and Open Problems in Federated Learning
academic · arXiv (Google Research et al.) · 2021-03-09
·

How are regulators treating federated learning in AI governance frameworks?

The EU AI Act (2024) and the NIST AI Risk Management Framework (2023) both encourage privacy-preserving ML techniques for high-risk AI applications. The European Commission has funded FL research under Horizon Europe, and the FDA's 2021 AI/ML action plan for medical devices cites decentralized learning as a pathway for continuous learning under regulatory oversight. [Source: European Commission / NIST / FDA]

Sources
Regulation (EU) 2024/1689 — Artificial Intelligence Act
primary · European Commission / Official Journal of the European Union · 2024-07-12
·
AI Risk Management Framework (AI RMF 1.0)
primary · National Institute of Standards and Technology (NIST) · 2023-01-26
·
Artificial Intelligence and Machine Learning (AI/ML)-Enabled Medical Devices
primary · U.S. Food and Drug Administration · 2021-01-12
·