← Back to Jogg
Jogg Blog
# Learn From the Papers That Built AI
### How Jogg's Research Paper Request Feature Brings the Literature to Life

*by the Jogg Team | MokingBird Oy*

---

There is a gap in how most people learn AI and machine learning.

Courses teach you to use the tools. Blog posts summarize the highlights. Tutorials walk you through the code. But somewhere along the way, the actual research papers — the documents that contain the original ideas, the hard-won experimental results, the precise formulations that define the field — get left behind.

This is a problem. Because if you want to truly understand why transformers work the way they do, why certain training tricks matter, or why one optimization algorithm outperforms another in specific contexts, you need to go back to the source. The papers.

Jogg was built with this belief at its core. And our **Paper Request** feature is one of the most direct expressions of it.

---

## What Is the Paper Request Feature?

The Paper Request feature lets you do something simple but powerful: **ask Jogg to generate a curated quiz set from a specific research paper**.

Instead of waiting for us to add a paper to our standard library, you can submit a paper of your choosing — by arXiv ID, title, or DOI — and Jogg will generate a set of high-quality quiz questions derived directly from that paper's content, methodology, and contributions.

This means your Jogg learning experience can be tailored to exactly the papers that matter most to your specific goals:
- The paper behind the model architecture you are implementing at work
- The foundational work referenced in your PhD thesis
- The latest transformer variant from NeurIPS that everyone on your team is discussing
- A classic paper you have always meant to read properly but never got around to

---

## Why Research Papers?

Let us be honest about something: research papers are hard to read. They are dense, notation-heavy, often assume familiarity with dozens of prior works, and sometimes bury their most important contribution in a technical appendix between two figures.

But the difficulty is worth it. Here is why.

### Papers Contain the Real Reasoning

A blog post about attention mechanisms will tell you that transformers use self-attention. The paper *Attention Is All You Need* tells you *why* the authors made every specific architectural decision — the positional encoding choices, the multi-head design, the specific dimension sizes, and what they tried that did not work. That reasoning is what separates deep understanding from surface familiarity.

### Papers Are the Ground Truth

When you are in a technical interview or a research discussion and someone asks "but why does multi-head attention project into lower-dimensional subspaces?", the answer that comes from actually having read the paper is qualitatively different from the answer synthesized from a summary.

### Papers Build Research Literacy

The ability to read, evaluate, and critically discuss research papers is one of the most valuable skills in AI/ML — and one of the least taught. Using Jogg's Papers Quest and Paper Request features trains this skill actively, not passively.

---

## The Curated Paper Library — What We Already Have

Before you request a custom paper, it is worth knowing what is already in our curated library. Jogg ships with **20 landmark papers** spanning the full AI/ML stack, covering decades of foundational and cutting-edge research:

### Foundation and Architecture Papers
**Attention Is All You Need** (Vaswani et al., 2017)
The paper that introduced the Transformer architecture and changed NLP forever. Every modern LLM descends from this work. Jogg includes questions covering self-attention, multi-head attention, positional encoding, and the encoder-decoder design.

**Deep Residual Learning for Image Recognition** (He et al., 2015)
ResNet introduced skip connections that enabled training of 152-layer networks. Jogg's questions explore why residual connections work, their mathematical interpretation, and their influence on subsequent architectures.

**BERT: Pre-training of Deep Bidirectional Transformers** (Devlin et al., 2018)
Bidirectional pre-training with masked language modeling. Questions cover the difference between BERT and GPT-style models, MLM vs CLM pre-training, and fine-tuning approaches.

**Language Models are Few-Shot Learners** / GPT-3 (Brown et al., 2020)
The paper that demonstrated in-context learning at scale. Questions probe few-shot prompting, the relationship between model scale and capability, and what in-context learning actually is.

**ImageNet Classification with Deep CNNs** / AlexNet (Krizhevsky et al., 2012)
The paper that started the deep learning revolution. Fundamental questions about ReLU, dropout regularization, data augmentation, and GPU-accelerated training.

### Efficiency and Optimization Papers
**LoRA: Low-Rank Adaptation of Large Language Models** (Hu et al., 2021)
The dominant approach to parameter-efficient fine-tuning. Questions cover the mathematical intuition behind low-rank decomposition, when to use LoRA versus full fine-tuning, and rank selection.

**FlashAttention** (Dao et al., 2022)
IO-aware exact attention that achieves 2-4x speedup. Questions cover the memory bottleneck in standard attention, tiling strategies, and why IO-bound rather than compute-bound operations are the real constraint.

**Adam: A Method for Stochastic Optimization** (Kingma & Ba, 2014)
The optimizer behind virtually every modern neural network training run. Questions cover adaptive learning rates, momentum, bias correction, and when Adam underperforms SGD.

**Batch Normalization** (Ioffe & Szegedy, 2015)
Internal covariate shift and how normalization addresses it. Questions cover the mechanics, why it enables higher learning rates, and the relationship between batch size and BN behavior.

### Generative Models
**Denoising Diffusion Probabilistic Models** (Ho et al., 2020)
The foundation of modern image generation (DALL-E 2, Stable Diffusion, Midjourney). Questions cover the forward/reverse diffusion process, the denoising objective, and the relationship to score matching.

**Generative Adversarial Networks** (Goodfellow et al., 2014)
The original GAN paper. Questions cover the minimax game, mode collapse, training instability, and the theoretical guarantees.

### Retrieval and Knowledge
**RAG: Retrieval-Augmented Generation** (Lewis et al., 2020)
The foundational paper for RAG systems. Questions cover dense passage retrieval, the combination of parametric and non-parametric memory, and practical implications for reducing hallucinations.

**Word2Vec / Efficient Estimation of Word Representations** (Mikolov et al., 2013)
CBOW and Skip-gram word embedding models. Questions cover the intuition behind semantic relationships in embedding space and the training objectives.

### Scaling and Alignment
**Scaling Laws for Neural Language Models** (Kaplan et al., 2020)
Empirical relationships between model size, data size, compute, and performance. Questions cover power law relationships, compute-optimal training (Chinchilla implications), and practical guidance for resource allocation.

**Training language models to follow instructions** / InstructGPT (Ouyang et al., 2022)
RLHF methodology and the foundation for ChatGPT. Questions cover the three-stage RLHF process, reward model training, and PPO in the context of language model alignment.

**Constitutional AI** (Bai et al., 2022)
AI feedback (RLAIF) and the use of constitutional principles for harmlessness. Questions cover scalable oversight, reducing dependence on human labelers, and the distinction between supervised and RL phases.

### Multimodal and Other Architectures
**CLIP** (Radford et al., 2021)
Vision-language alignment through contrastive learning on 400M image-text pairs. Questions cover the contrastive objective, zero-shot classification, and the implications for multimodal AI.

**MoE: Outrageously Large Neural Networks** (Shazeer et al., 2017)
Sparse mixture-of-experts layers. Questions cover gating mechanisms, sparse activation, expert routing, and the computational efficiency argument for MoE.

**PPO: Proximal Policy Optimization** (Schulman et al., 2017)
The policy gradient algorithm behind RLHF. Questions cover the clipped surrogate objective, the importance of the trust region constraint, and how PPO fits into the broader RLHF pipeline.

---

## How Paper Requests Work

When you submit a paper request, here is what happens behind the scenes:

### 1. Paper Ingestion
The paper is fetched (from arXiv or uploaded directly) and processed through our content extraction pipeline. This extracts the full text, mathematical notation, figures, tables, and reference structure.

### 2. Content Processing
Our system processes the paper in multiple passes:
- Identifying key contributions and main claims
- Extracting experimental results and their context
- Mapping the paper to our 9-lane topic taxonomy
- Identifying prerequisite knowledge needed to understand the paper

### 3. Question Generation
MokingbirdDataGen generates questions across multiple question types:
- **Conceptual questions** — testing understanding of the core ideas
- **Technical questions** — testing understanding of methodology and implementation
- **Comparative questions** — relating this paper's contribution to prior or subsequent work
- **Application questions** — testing ability to apply the paper's insights to new scenarios

Questions are generated at multiple difficulty levels, from accessible entry points for readers new to the topic to expert-level questions about subtle experimental choices and limitations.

### 4. Quality Review
Generated questions are run through our quality pipeline — filtering for accuracy, clarity, and educational value. Questions that do not meet our bar are rejected or revised before they are added to your request set.

### 5. Your Question Set
The resulting question set is added to your Papers Quest library, tagged with the paper's metadata, and becomes available immediately. You can quiz yourself on it, include it in custom quizzes, or share it with study partners.

---

## Who Uses Paper Requests?

### PhD Students and Researchers
You need to truly master the papers in your literature review — not just cite them, but deeply understand their methods, assumptions, and limitations. Paper Request lets you build exam-quality quiz sets for every paper you need to internalize.

**Scenario:** You are writing a thesis chapter on efficient transformers. You submit FlashAttention, FlashAttention-2, Ring Attention, and Longformer as paper requests and build a custom study set covering the full landscape of efficient attention mechanisms.

### ML Engineers Staying Current
You want to understand the papers behind the tools you use every day — not just how to use them, but why they were designed the way they were.

**Scenario:** Your team is evaluating whether to use LoRA or QLoRA for fine-tuning a production model. You request both papers, quiz yourself on the technical tradeoffs, and come to the decision-making meeting with actual depth.

### Interview Candidates
Technical interviews at top AI companies frequently go deep into research papers. Companies like Google DeepMind, Meta AI, and OpenAI expect candidates to be conversant with recent research.

**Scenario:** You are interviewing for a research engineer role. Your recruiter mentioned the team works heavily on RLHF alignment. You request InstructGPT, Constitutional AI, and Direct Preference Optimization (DPO) and build a study set covering the alignment literature.

### Students in AI Courses
Many university AI/ML courses assign specific papers as required reading. Paper Request turns passive reading into active learning.

**Scenario:** Your NLP course assigns five papers for the week including BERT, GPT-2, and T5. Instead of just reading them, you quiz yourself after each one and confirm your understanding before the lecture.

---

## The Broader Vision: AI/ML Knowledge from First Principles

The research paper request feature reflects a philosophy that is central to everything Jogg does: **AI/ML should be learned from first principles, not just from tutorials**.

The papers are the first principles. They contain the actual ideas, the actual experiments, the actual tradeoffs that define how modern AI systems work. Everything else — the blog posts, the library documentation, the tutorials — is a derivative of the papers.

When you can discuss the content of landmark papers fluently, when you understand not just what RLHF is but why PPO was chosen as the RL algorithm and what the reward model training involves, you are operating at a qualitatively different level than someone who learned AI from YouTube.

That is the level Jogg is built to help you reach.

---

*Submit a paper request from the Papers Quest section of the app.*
*Questions? Contact us at [email protected].*

*Jogg — Your Professional AI/ML App.*