project Feb 28, 2026

Building a Second Brain That Argues With Me

Reading style:

ai knowledge-management personal-tools

TLDR: I consume a ton of content across every format — articles, podcasts, YouTube videos, conversations — and couldn’t remember what I actually thought about any of it. So I built a system that processes everything I consume, finds connections I’d miss on my own, and actively challenges my existing beliefs. It changed how I think, not just how I store information.

The problem

I consume a lot. Articles, newsletters, research papers, YouTube videos, podcasts, conversations with smart people. Hundreds of pieces of content a year, across every format imaginable. And I kept having this experience: someone would mention a topic and I’d think “I read something about that… six months ago… I think I disagreed with part of it?” But I couldn’t remember the specifics. The insight was gone.

I tried the usual things. Bookmarks. Notion databases. Saving articles to read-later apps. The result was always the same: a growing pile of stuff I’d saved but never revisited. A digital junk drawer that made me feel productive without actually being useful.

The real problem wasn’t storage. I could store anything, anywhere, for free. The problem was that none of these tools helped me think. They were filing cabinets. I needed something that could look at everything I’d read and say “hey, this thing you read about AI pricing last week contradicts what you believed about AI commoditization three months ago.” I needed a system that could find the connections I was too scattered to notice, and then push back on my own thinking.

What I actually built

I built a personal knowledge system. At its core, it’s pretty simple: when I come across something interesting — an article, a YouTube video, a podcast episode, a conversation, anything — I send it to my AI partner Bob (more on him in another post). He processes it regardless of format, extracts the key ideas, and files a structured note. That part takes about 30 seconds of my time.

But the filing is just the beginning. Every night, the system runs a synthesis process. It takes all my notes, compares their ideas using the same kind of math that powers search engines, and looks for connections across different topics. An article about AI agent pricing from a tech newsletter might connect to an observation about enterprise software moats from a completely different source. The system finds these overlaps and writes up what it sees.

The part I’m most excited about is what I call living theses. Instead of just collecting notes, the system maintains a set of positions I currently hold about the world. Things like “AI capabilities become free within two years” or “boring, repetitive workflows are where AI creates the most value.” Each thesis tracks the evidence for and against it, with a confidence score.

When I read something new, the system checks: does this support or challenge any of my existing theses? If a new article contradicts something I believe, it gets flagged. If I’ve been holding a position with high confidence but zero counterevidence, the system tells me I might be in a blind spot. It runs a weekly audit and generates research questions to pressure-test the weakest points in my thinking.

In other words, it argues with me. On purpose.

What surprised me

I expected the value to come from better retrieval. Ask a question, get a relevant note, save time. That’s useful, but it turned out to be the least interesting part.

The real value is in the connections. I keep finding links between ideas from completely different domains that I never would have made on my own. An article about how Stripe runs autonomous coding agents connects to a thesis about boring workflows connecting to an observation about enterprise sales cycles. These cross-domain links are where the interesting thinking happens, and they’re exactly what a human brain is bad at tracking across hundreds of sources.

The other surprise was how much the thesis system changed my reading habits. When you know that everything you read gets evaluated against your existing beliefs, you start reading differently. You notice when you’re just collecting evidence that confirms what you already think. You start seeking out things that might prove you wrong, because that’s where the system (and your thinking) gets stronger.

How it works, roughly

For the curious: the notes are plain markdown files, stored locally, synced to the cloud, and viewable in a notes app. Nothing proprietary. If the system disappeared tomorrow, I’d still have a folder of well-organized notes.

The intelligence layer runs on top. Each note gets converted into a mathematical representation (an embedding) that captures its meaning. The system compares these representations to find notes that are conceptually similar, even if they use completely different words. A note about “enterprise sales cycles shortening when ROI is obvious” might match with a note about “AI agent value in measurable workflows,” because the underlying concepts overlap.

The synthesis process uses a large language model to read pairs of related notes and write up what connects them, where they disagree, and what questions they raise together. The thesis auditor scores each position on confidence, evidence count, staleness, and whether it’s been challenged recently.

Everything runs automatically. I just keep reading and sharing interesting things. The system does the rest.

What’s next

I’m working on closing the loop: when the thesis audit finds a weak spot in my thinking, it should automatically go research the question and bring back what it finds. Right now it generates the questions but I still have to go find the answers manually. Soon it’ll do that too.

I’m also thinking about how to make the theses themselves shareable. Right now they live in my private system, but the idea of publicly maintained positions with tracked evidence and confidence scores feels like it could be interesting for more people. Imagine being able to see someone’s actual track record of predictions and belief updates, not just their latest hot take.

But mostly I’m just enjoying having a system that makes me a better thinker. Consuming hundreds of pieces of content used to mean hundreds of forgotten insights. Now it means a growing web of connected ideas that keeps getting smarter. Including about the things I might be wrong about.

The problem, specifically

I was consuming a lot of content across domains - AI infrastructure, business strategy, investing, engineering practices - and the ideas weren’t connecting. I had notes scattered across five different tools, none of them aware of each other. The failure mode wasn’t retrieval (search is solved). It was synthesis: no system was telling me that an article about Stripe’s autonomous coding agents and an article about enterprise sales cycles were making related observations about measurable workflows.

I wanted three things:

Near-zero-friction capture from any source or format
Automated connection discovery across domains
A mechanism that actively challenges my existing positions, not just files supporting evidence

Architecture

Capture (any channel)
    → Bob (AI agent) processes source, extracts structured note
    → Markdown file in ~/Documents/knowledge/notes/{domain}/
    → YAML frontmatter (tags, source, type, domain, created)
    → 4 required sections: Summary, Key Facts, Key Ideas, Observations

Storage & Embeddings
    → Postgres + pgvector (Docker, localhost)
    → Bedrock Titan Embed v2, 1024 dimensions
    → Section-level chunking + note-level summary embeddings

Connection Discovery
    → Hybrid scoring: tag overlap (0.2) + embedding similarity (0.6) + explicit wikilinks (0.2)
    → Auto-backlinking: writes [[wikilinks]] into note Connections sections

Synthesis (2 AM daily cron)
    → pkb sync → Opus with thinking:high
    → 16-step synthesis framework per note pair
    → Weekly cross-note cluster analysis

Living Theses
    → ~/Documents/knowledge/theses/*.md
    → YAML frontmatter: thesis, confidence (0-1), status, evidence counts
    → Weekly health audit: flags underdeveloped, unchallenged, and stale theses
    → Template-based research question generation

Stack choices

Markdown notes in a local vault over a database-first approach. The notes are the source of truth and they sync to Google Drive for Obsidian access. The database is a derived layer - if pgvector disappeared, I’d rebuild from the files. This also means the capture format is human-readable and portable. No proprietary lock-in.

pgvector over a dedicated vector DB (Pinecone, Weaviate, etc.). The corpus is small — hundreds of notes with hundreds of chunks. That’s a toy workload. Postgres handles it without breaking a sweat, and I don’t need a separate service to manage. pgvector’s exact nearest neighbor search is fine at this scale.

Bedrock Titan Embed v2 over OpenAI embeddings. Already on AWS with an assume-role chain set up for Bedrock. 1024 dimensions is plenty. The 8K character input limit was a problem though - more on that below.

Section-level chunking over fixed-size sliding windows. Every note follows the same template (Summary, Key Facts, Key Ideas, Observations), so I can chunk by section and preserve semantic coherence. A Key Facts chunk contains facts; a Key Ideas chunk contains ideas. This gives the connection scorer meaningful units to compare.

Embedding decisions and tradeoffs

The chunking strategy went through three iterations.

v1: One embedding per section. Simple. Each of the 5-6 sections per note gets a single 1024-dim vector. Problem: Titan’s 8K character limit silently truncates long sections. I didn’t catch this for weeks - longer notes were losing half their content, and the embeddings were quietly worse.

v2: Overlapping windows for long content. If a chunk exceeds 7K characters, split into 6K windows with 1K overlap, splitting at paragraph or sentence boundaries (not mid-word). Each window gets its own embedding: full_1, full_2, etc. Connection scoring uses max similarity across all windows for a given note. This fixed the truncation but added chunk count - went from ~500 to ~776 chunks.

v3 (current): Summary-based note-level embeddings, dropped redundant full-body chunks. I realized the full chunk (entire note body) was both the most expensive and least useful embedding. The summary section already captures the thesis and core argument in 2-4 sentences - that’s a better note-level representation than a diluted average of the whole body. Dropped full-body chunks entirely: 776 → 634 chunks, -18%. Connection quality stayed the same or improved.

The lesson: more embeddings isn’t better. A focused summary embedding outperforms a bloated full-body embedding for note-level similarity. Section-level embeddings still earn their place for discovering which aspect of note A connects to note B.

Hybrid connection scoring

Pure embedding similarity misses things. Two notes might use different vocabulary but have identical tags. Or a human might explicitly link them with a [[wikilink]] that captures a connection no embedding would find. So scoring is a weighted combination:

Signal	Weight	Why
Tag overlap	0.2	Cheap, catches obvious domain connections
Embedding similarity	0.6	Semantic similarity regardless of vocabulary
Explicit wikilinks	0.2	Human-curated connections are high signal

The weights are hand-tuned and feel about right. Tag overlap alone would miss cross-domain connections (an AI article connecting to an investing article). Embeddings alone would miss connections where the relationship is structural, not semantic. Wikilinks capture “I know these are related because I read both” - a signal the system can’t derive.

Auto-backlinking writes discovered connections into the notes’ ## Connections section as wikilinks. This means Obsidian’s graph view shows the full connection map, and the wikilink signal feeds back into future scoring. Virtuous cycle.

Synthesis pipeline

The nightly cron runs pkb sync which:

Detects new/modified notes since last run (registry-based, _meta/index.json)
Embeds new content (incremental - doesn’t re-embed unchanged notes)
Discovers connections above a similarity threshold
For each new high-scoring connection pair: runs a 16-step synthesis using Opus with thinking:high
Re-embeds notes that received new synthesis content
Weekly: full cross-note cluster analysis across the corpus

The 16-step synthesis framework covers: surface connection identification, assumption comparison, contradiction detection, framework transfer analysis, and actionable implications. It’s opinionated - the output isn’t a generic “these notes are related.” It’s “note A claims X, note B claims Y, and the tension between them suggests Z.”

Cost consideration: Opus with thinking:high is expensive. Each synthesis pair costs ~$0.10-0.15 in tokens. Incremental runs (only new notes) keep nightly costs low. I run the weekly cluster analysis on the same model because that’s where the real insight comes from - it’s worth the cost.

Living theses

This is the layer I’m most interested in. Notes are evidence. Theses are positions.

Each thesis lives in ~/Documents/knowledge/theses/ as a markdown file:

---
thesis: "Boring, repetitive workflows with clear completion criteria are where AI agents create the most value"
confidence: 0.65
status: active
created: 2026-02-28
last_updated: 2026-02-28
---

With sections for Evidence For (wikilinks to supporting notes), Evidence Against, Implications, and Open Questions. Currently 11 active theses with confidence scores ranging from 0.50 to 0.80.

The thesis health audit (pkb thesis-audit) runs weekly and flags:

Condition	Signal
Confidence < 0.6 AND evidence < 3	Underdeveloped - needs evidence
Confidence > 0.75 AND zero contradictions	Unchallenged - needs devil’s advocate
Not updated in 30+ days	Stale - may have drifted from reality

For each unhealthy thesis, the engine generates 2-3 template-based research questions. Low confidence + low evidence gets evidence-seeking questions. High confidence + zero contradictions gets devil’s advocate questions (“What conditions would make this false?”). Stale theses get recency questions (“What has changed in the last 30 days?”).

The thesis digest (pkb thesis-digest) produces a Telegram-friendly summary of thesis health, flagged items, and pending research questions. Monday morning delivery.

What’s working, what’s not

Working well:

Capture friction is genuinely near-zero. I send a URL or video to Bob, get a structured note in 30 seconds. I’ve sustained daily capture for weeks - the habit stuck because the barrier is low.
Cross-domain connections are real and surprising. The system found a link between an article about Stripe’s coding agents and a thesis about boring workflows that I wouldn’t have made manually.
Nightly synthesis produces actionable output, not summaries. The 16-step framework forces specific claims and identified tensions.

Not working yet:

The thesis research engine generates questions but doesn’t execute searches. I’m closing this loop - the audit identifies weak spots, but I still manually research them.
Cross-note synthesis is O(n²) in the worst case. At 90 notes it’s fine. At 500 it won’t be. I have a tiered scaling plan (L1: note-level filter, L2: cluster summaries, L3: targeted deep analysis) but haven’t needed it yet.
No eval framework for synthesis quality. I read the outputs and they’re good, but I have no systematic way to measure whether they’re getting better or worse over time.

Numbers

90 notes across 6 domains (ai, business, engineering, investing, health, parenting)
634 embeddings (section-level + note-level summary)
11 active theses
160 tests, ruff clean
Nightly synthesis: ~2 min runtime, ~$2-5/night in LLM cost
Full re-embed: ~45 seconds for the whole corpus
Stack: Python 3.14, pgvector on Postgres (Docker), Bedrock Titan v2, Opus for synthesis

What’s next

Close the research loop - thesis audit → research questions → automated web search → evidence integration. The scaffolding is built; the search execution is the missing piece.
Novelty gate at capture time - before creating a note, check if the content adds new information vs. what’s already in the corpus. Reduce redundant notes and wasted synthesis cost.
Tiered synthesis scaling - as the corpus grows past 200 notes, shift from all-pairs comparison to hierarchical: note-level filter → cluster-level → targeted deep analysis.
Synthesis quality evals - build a rubric and measure whether the pipeline is actually producing better insight over time, not just more output.

Have you ever learned something really cool in class, and then a few weeks later someone asks you about it and you go… “Wait, I know I learned that. What was it again?”

That happens to me all the time. But instead of school subjects, it’s stuff I read. Articles, videos, podcasts — I take in a lot of information every day. And most of it just… disappears from my brain. Like writing something on a whiteboard and then someone erases it while you’re not looking.

So I started wondering: what if I had a notebook that didn’t just store what I wrote, but actually thought about it?

The trading card problem

Idea cards scattered across a room -- on a desk, in boxes, in a backpack, on the bed. Each card has a lightbulb on it.

Imagine you collect trading cards. You’ve got hundreds of them, all stuffed in different boxes and drawers and maybe some are in your backpack. You know you have good ones in there somewhere. But when your friend asks “do you have any water-type cards?” you have to dig through everything to find out.

That’s what my notes were like. Scattered everywhere. No way to find connections between them.

But here’s the thing that bothered me even more: what if two of your cards are actually part of a really powerful combo, but you never realized it because they were in different boxes? You’d never put them together. You’d never see the pattern.

That’s the real problem I wanted to solve. Not just finding my notes — but finding connections between ideas that I’d never spot on my own.

The same idea cards, now organized on a board with glowing golden threads connecting related ones. A friendly robot points at a connection.

So what does it actually do?

When I find something interesting — a video, an article, a podcast, anything — I send it to my AI helper named Bob. Bob reads it (or watches it, or listens to it) and writes down the most important ideas in a really organized way. That takes me about 30 seconds.

But here’s where it gets interesting.

Every night, while I’m asleep, the system goes through ALL of my notes and asks: “Do any of these ideas connect to each other?” It’s like having a really smart librarian who reads every single book in the library and then comes to you in the morning and says “Hey, did you know that this book about animals and this book about weather are actually talking about the same thing from different angles?”

Sometimes it finds connections I would never have made. An idea from a technology article connects to something from a business article in a way that makes me think about both of them differently.

The part that argues with me

Here’s my favorite part. The system keeps track of things I believe. Not just facts I’ve learned, but opinions and ideas I have about how the world works.

For example, one thing I believe is: “The most useful thing AI can do is handle boring, repetitive tasks.” The system keeps a list of evidence that supports that belief AND evidence that challenges it. Like a debate team, but both sides are working for me.

Every week, it checks: Am I being fair about my own ideas? If I believe something really strongly but I’ve never looked at any evidence against it, the system says “Hey, you might have a blind spot here. Have you considered that you could be wrong?”

Think about that for a second. How often do you check whether your own ideas might be wrong? It’s really hard to do. We all like feeling right. But getting better at thinking means being willing to look at the other side, even when it’s uncomfortable.

Why this matters (for you too)

You don’t need a fancy computer system to do something like this. Here’s what you can try right now:

Next time you learn something new — in class, from a book, from a YouTube video — write it down in one sentence. Just the main idea. Keep all those sentences in the same place, like a jar full of ideas.

Every once in a while, dump them all out and look at them together. Ask yourself: Do any of these connect? Does this idea from science class have anything to do with that thing I learned in history?

You’ll be surprised. Ideas from completely different places often have hidden links between them. Finding those links is one of the most powerful things your brain can do. It’s called thinking across boundaries, and most adults are terrible at it.

Something to think about

Here’s a question I still don’t have a good answer to:

If a computer can find connections between ideas that I missed, does that mean the computer is smarter than me? Or does it just mean it’s better at one specific thing — like how a calculator is better at math but can’t tell you why a joke is funny?

What do you think?