Edit wiki/research-notes/experiment-design.md — Work Reflections

---
visibility: public-edit
---

# how to design tests, iterate, and ensure robustness

from running dozens of experiments at a [[neurotech startup|signal-processing-workflow]] — on brains, on hardware, on signals — i developed a sense for what separates experiments that produce useful data from experiments that produce noise.

## the experiment design framework

### define what you're measuring before you start

the most common failure: "not insanely intentional with testing. could do much better with understanding what is going on and for what reason."

concretely, before any test:
- what signal am i looking for?
- what would success look like in the data?
- what would failure look like?
- what are the confounds?

for our visual evoked potential tests, the checklist was:
- one eye (monocular)
- fixate on central target
- dark room
- 70-100cm from screen
- consider contrast
- which test type (pattern reversal at 2Hz, onset/offset, flash)

### control one variable at a time

we made the mistake of changing multiple things between tests — new stimulus pattern AND new filtering AND new electrode placement. when results changed, we couldn't tell why.

the better approach: "do something to get a VEP on myself first." test the simplest possible case. if that doesn't work, the problem is fundamental. if it does work, add complexity one variable at a time.

### the validation ladder

from simplest to most complex:
1. can you see alpha waves with eyes closed? (if no, hardware is broken)
2. can you see a response to a flash on yourself? (if no, timing or processing is broken)
3. can you see a response on another person? (if no, setup or parameters might be off)
4. can you reproduce results on a second trial? (if no, noise is too high)
5. can you see differences between conditions? (this is the actual experiment)

skipping to step 5 without passing steps 1-4 was a mistake i made repeatedly.

## iteration patterns

### the research → test → analyze loop

"lots of iteration. lots of failure." the actual workflow:
1. research: read standards, look at what parameters others use
2. set up: configure hardware, write stimulus script, prepare environment
3. run: execute the test, usually 15-45 minutes
4. analyze: process data, plot, look at results
5. interpret: is this real signal or noise? compare to expected results.
6. adjust: change one parameter and go back to step 3

steps 3-6 repeat dozens of times. "tested checkerboard on myself, tried flash on myself, tried checkerboard on another person, old stuff was all noise."

### when to pivot vs persist

"just check differences and try to maximize it" vs "maybe they are just good, we are reasonably satisfied." the tension between perfectionism and pragmatism.

heuristic: if you've tried 5 different parameter combinations and none work, the problem is probably not the parameters. step back and question the approach.

"tried many training things, didn't do much" — knowing when to pivot to a completely different method rather than tweaking the current one.

## the timing problem

timestamp synchronization was a recurring nightmare. each device (EEG headset, stimulus presentation, sensors) has its own clock. getting them aligned:

- tried using a wire to send sync pulses — "doesn't work because it sends a pulse that is picked up by the headset in a huge spike"
- tried software timestamps — "current script might not be using some settings that are important"
- tried logging approach — eventually worked but required careful validation

"need to get correct times" — without precise timing, epoch-averaging is meaningless. this is an unsexy but critical part of [[experiment design|experiment-design]] that textbooks don't emphasize enough.

## designing for robustness

### the subject experience

"if want longer tests, distraction — brain not focusing." for human experiments, the subject's experience matters:
- boredom causes attention drift
- watching videos might confound the signal (occipital lobe activation, pupil changes from brightness)
- need to balance test duration with data quality

"how to have them not be bored? video? inscapes might not be bad — could confound occipital lobe, could affect pupils due to brightness."

### the engineering gamble

sometimes you have to make a call with incomplete information. "data analysis stuff: hard because big data, for robustness can't really use LLMs that much."

the meta-skill: knowing when you have enough data to make a decision vs when you're just pattern-matching on noise. "interpreting graphs — sometimes good but looks bad, sometimes bad but looks good. need to balance time scrutinizing graph and writing script."

---

*see also: [[signal processing workflow|signal-processing-workflow]], [[debugging hardware|debugging-hardware]], [[reading papers|reading-papers]]*