new file mode 100644
index 0000000..966cd22
@@ -0,0 +1,97 @@
+---
+visibility: public-edit
+---
+
+# how to design tests, iterate, and ensure robustness
+
+from running dozens of experiments at a [[neurotech startup|wiki/research-notes/signal-processing-workflow]] — on brains, on hardware, on signals — i developed a sense for what separates experiments that produce useful data from experiments that produce noise.
+
+## the experiment design framework
+
+### define what you're measuring before you start
+
+the most common failure: "not insanely intentional with testing. could do much better with understanding what is going on and for what reason."
+
+concretely, before any test:
+- what signal am i looking for?
+- what would success look like in the data?
+- what would failure look like?
+- what are the confounds?
+
+for our visual evoked potential tests, the checklist was:
+- one eye (monocular)
+- fixate on central target
+- dark room
+- 70-100cm from screen
+- consider contrast
+- which test type (pattern reversal at 2Hz, onset/offset, flash)
+
+### control one variable at a time
+
+we made the mistake of changing multiple things between tests — new stimulus pattern AND new filtering AND new electrode placement. when results changed, we couldn't tell why.
+
+the better approach: "do something to get a VEP on myself first." test the simplest possible case. if that doesn't work, the problem is fundamental. if it does work, add complexity one variable at a time.
+
+### the validation ladder
+
+from simplest to most complex:
+1. can you see alpha waves with eyes closed? (if no, hardware is broken)
+2. can you see a response to a flash on yourself? (if no, timing or processing is broken)
+3. can you see a response on another person? (if no, setup or parameters might be off)
+4. can you reproduce results on a second trial? (if no, noise is too high)
+5. can you see differences between conditions? (this is the actual experiment)
+
+skipping to step 5 without passing steps 1-4 was a mistake i made repeatedly.
+
+## iteration patterns
+
+### the research → test → analyze loop
+
+"lots of iteration. lots of failure." the actual workflow:
+1. research: read standards, look at what parameters others use
+2. set up: configure hardware, write stimulus script, prepare environment
+3. run: execute the test, usually 15-45 minutes
+4. analyze: process data, plot, look at results
+5. interpret: is this real signal or noise? compare to expected results.
+6. adjust: change one parameter and go back to step 3
+
+steps 3-6 repeat dozens of times. "tested checkerboard on myself, tried flash on myself, tried checkerboard on another person, old stuff was all noise."
+
+### when to pivot vs persist
+
+"just check differences and try to maximize it" vs "maybe they are just good, we are reasonably satisfied." the tension between perfectionism and pragmatism.
+
+heuristic: if you've tried 5 different parameter combinations and none work, the problem is probably not the parameters. step back and question the approach.
+
+"tried many training things, didn't do much" — knowing when to pivot to a completely different method rather than tweaking the current one.
+
+## the timing problem
+
+timestamp synchronization was a recurring nightmare. each device (EEG headset, stimulus presentation, sensors) has its own clock. getting them aligned:
+
+- tried using a wire to send sync pulses — "doesn't work because it sends a pulse that is picked up by the headset in a huge spike"
+- tried software timestamps — "current script might not be using some settings that are important"
+- tried logging approach — eventually worked but required careful validation
+
+"need to get correct times" — without precise timing, epoch-averaging is meaningless. this is an unsexy but critical part of [[experiment design|wiki/research-notes/experiment-design]] that textbooks don't emphasize enough.
+
+## designing for robustness
+
+### the subject experience
+
+"if want longer tests, distraction — brain not focusing." for human experiments, the subject's experience matters:
+- boredom causes attention drift
+- watching videos might confound the signal (occipital lobe activation, pupil changes from brightness)
+- need to balance test duration with data quality
+
+"how to have them not be bored? video? inscapes might not be bad — could confound occipital lobe, could affect pupils due to brightness."
+
+### the engineering gamble
+
+sometimes you have to make a call with incomplete information. "data analysis stuff: hard because big data, for robustness can't really use LLMs that much."
+
+the meta-skill: knowing when you have enough data to make a decision vs when you're just pattern-matching on noise. "interpreting graphs — sometimes good but looks bad, sometimes bad but looks good. need to balance time scrutinizing graph and writing script."
+
+---
+
+*see also: [[signal processing workflow|wiki/research-notes/signal-processing-workflow]], [[debugging hardware|wiki/research-notes/debugging-hardware]], [[reading papers|wiki/research-notes/reading-papers]]*
\ No newline at end of file