Organizing Research

Machine Learning Polymath Project

As far as I know, there have not been any massively collaborative projects in machine learning research. If commenters know of any, please link them here. Examples of mathematics projects are collected on this page.

The mathematician Tim Gowers describes an idea in Is massively collaborative mathematics possible? for doing research in an open, incredibly widely collaborative style. Michael Neilson describes the method and results briefly in his book Reinventing Discovery:

“In January 2009, Gowers decided to use his blog to run a very unusual social experiment. He picked out an important and difficult unsolved mathematical problem, a problem he said he’d “love to solve.” But instead of attacking the problem on his own, or with a few close colleagues, he decided to attack the problem completely in the open, using his blog to post ideas and partial progress. What’s more, he issued an open invitation asking other people to help out. Anyone could follow along and, if they had an idea, explain it in the comments section of the blog. Gowers hoped that many minds would be more powerful than one, that they would stimulate each other with different expertise and perspectives, and collectively make easy work of his hard mathematical problem. He dubbed the experiment the Polymath Project.”

“...And just three minutes after that, UCLA mathematician Terence Tao—like Gowers, a Fields medalist—added a comment. The comments erupted: over the next 37 days, 27 people wrote 800 mathematical comments, containing more than 170,000 words. Reading through the comments you see ideas proposed, refined, and discarded, all with incredible speed. You see top mathematicians making mistakes, going down wrong paths, getting their hands dirty following up the most mundane of details, relentlessly pursuing a solution. And through all the false starts and wrong turns, you see a gradual dawning of insight. Gowers described the Polymath process as being “to normal research as driving is to pushing a car.”

One dramatic difference between the fields is that there is an experimentation step in the loop of machine learning research. Still, many problems still seem amenable to the advantages of differences in skill set, fresh eyes and perspective on a problem, knowledge of different swaths of the research literature and more advantages listed in Tim’s post.

A major and unanswered question is how to do credit assignment.

Running an experiment like this may, for some kinds of problem, reveal a new and dramatically improved research process. Concretely, announcing a clear research goal backed by a shared codebase, shared conversational thread, and shared results visualizations and evaluation can kickstart the experiment. The project likely has to be kept internal, but even internally there are more than enough potential participants to explore this mode of research.

The major example of an attempt like this in the machine learning research community is Francois Chollet’s AI-ON. Successful open communities have formed around learning and open source code, such as Fast.ai and Andrew Trask’s OpenMined. Kaggle’s competitions could be seen as a crowdsourced dataset-specific research platform. Numerai and Quantopian attempted to create a crowdsource hedge fund.

A Specific Proposal

Tim Gowers started the polymath project with a very specific research problem: exploring a combinatorial approach to the density Hales-Jewett Theorem for k=3 (DHJ(3)). It produced two papers authored by D. H. J. Polymath.

Organizations

One option is to convince a working group at Brain or OpenAI to do collaboration in the open. A second option is to post (or have another researcher post) an open problem to an open forum like a blog, and have others chime in. Andrej Karpathy, Chris Olah, Eric Jang, John Hutch, and Shakir Mohomad have frequented blogs. A third option would be to run this through Distill directly.

Research Area

Certain problems lend themselves to crowdsourced research collaboration. Compute constraints block off a fraction of research ideas - In the maximally accessible, ideal case a notebook would be sufficient to demonstrate progress.

For work that I’m involved in, improving model calibration looks like a place with low hanging fruit and where many perspectives can be useful. Most recalibration techniques can be run in a notebook. Model logits can be shared for recalibration. Experiments are inexpensive.

Adversarial examples research also feels promising. Many examples can be generated in a compute-lite environment. Much progress is mathematical.

Kaggle Inspired[a]

Incredible idea - kaggle-like competition for machine learning benchmarks, where your name gets on the paper if at any point you have the top performing model, with the condition that you have to share your solution in a gist or github repo. This would work for something like class discovery, trying to maximize dataset reconstruction accuracy over a few datasets simultaneously. First authorship goes to the final winning team.

If I put more thought into designing the incentive structure properly, this could blow up. Especially if we could reserve a residency spot at Brain for the top performer who accepts it, or something akin to that.

Openmined Inspired

Create a slack channel for questions, help with tutorial, planning and decision making activity. Github library for shared contributions (there are > 15 repositories). Website for the vision statement and for routing people to research projects, as well as covering previous successes.

AI-ON Inspired

Problems are curated into a list. Initially there were 10 problems, which were pushed to legacy (ask Francois what went wrong). Or, chose problems that existing researchers are already working on. It looks like this second approach died as well - the slack join link is dead, and there are no repo updates for the consciousness prior research for the last 2 years.

Github repos are used to coordinate. There’s a list of relevant work to be read before contributing. An issue is created by the researchers for which help is wanted, on some part of an open problem.

For communication, there is a slack channel and a google group for each project. Both are dead for each project, but presumably this is where extra coordination happened.

University Course

Have a professor who makes original research one of the course requirements point to this platform as an accessible path to get into research.

Research Prediction

There’s a question of forecasting in research, where in deciding both what to work on as well as how to solve a given technical problem, a researcher predicts both whether or not a hypothesized solution to a problem will work as well as how well it will work.

In his book Superforcasting, Phillip Tetlock describes his team’s discovery that there are dramatic differences between a forecaster’s ability to predict the outcomes of events, as well as the tactics and mentalities of the strongest forecasters.

Are there researchers in machine learning whose skill in predicting the effectiveness of their solutions as well as generating solutions makes a different process for research possible?

If it does and is measurable, researchers shown to be high quality forecasters can make predictions about which experiment ideas are likely to perform well, dramatically cutting down the time required to make progress. Researchers shown to be able to generate high quality experiment ideas can feed those that criticize and predict the outcomes of experiments well.

One concrete experiment would be to take the results of 20-30 relatively recent published papers whose results are not widely known and describe the methods applied to each dataset, asking researchers to predict the gain (or lack thereof) in the metric used to track progress.

Researchers shown to be effective predictors can make predictions on experiment ideas that have not been implemented yet, scoring them for potential. For some types of idea, this filtration step may dramatically reduce unnecessary and predictable technical risk.

A natural fear is that this will systematically discourage ideas that don’t accord with the intuitions of strong forecasters, which may reduce the incidence of results that conflict with shared assumptions and biases. As these are often the most important results, it makes sense to take care not to scuttle the ideas that lead to them.

Process

Begin by teaching a brief 1-2 hour course to willing researchers on:

Reference Class Comparison
1. Strike the right balance between inside and outside fews
2. Start with the outside view first, update with inside views
Triage
1. Focus your attention on the questions where your efforts have a good chance of making progress. Not on the intractable predictions, and not on the trivial predictions.
Decomposition
1. (Break seemingly intractable problems into tractable sub-problems)
Be nuanced about how exactly much new information should affect your forecast
Look for the clashing causal forces at work in each problem
1. Decompose to the causal level
2. Re-represent the problem
Be uncertain in a nuanced way.
1. “Strive to distinguish as many degrees of doubt as the problem permits but no more.”
Balancing under- and overconfidence
Balancing prudence and decisiveness
Look for the errors
1. Beware of hindsight bias.
2. List all major cognitive biases here
Balance Errors
Predicting in teams.

Or better yet, randomly select half the researchers to give this prediction training to. Don’t train the other half and see whose performance is superior.

Research prediction is where medicine was in the 1800s. Ignorance and confidence were the defining features of medicine. Fringe or mainstream, almost all of it wrong, with treatments ranging from frivolous to the dangerous.

Tetlock opens with a chapter that effectively says that experts are often wrong, and over long time spans can’t beat random chance on many issues. But prediction is possible, and it’s possible to improve.

Types of Forecasting:

Benchmark progress
Specific experiment outcomes
Trends (ex., paper counts in a category like Model-Based RL or Causality)

One major challenge seems to be researcher familiarity. Benchmarks take time to understand and there are many of them. Understanding the tasks that are in the benchmark are strongly relevant to the performance of the forecasting.

A simplified version of this competition will use specific paper and benchmark predictions that come from the same benchmark, rather than expecting researchers to learn all of them.

There are different levels of previous familiarity with the benchmark, and it may be good to evaluate researcher predictions on benchmarks they’ve all just been introduced to (to level the playing field, ensuring that the best predictor wins rather than the one who happened to do research on the given benchmark).

We can prefer targeting continuous values as predictions rather than binary or categorical values. This allows for degrees of error, and is a much easier signal to calibrate against.

Predicting events that have already happened but that are poorly known can get rid of the time consuming wait for predictions to occur. It also stops a researcher from working on the very problem they’ve made predictions for, and so contaminate the results.

Incentive structures could be built around the predictions. One game could be a Kaggle style competition, where a leaderboard tracks the most effective predictors. Financial rewards could be tied to the predictions, and it could be possible to register to predict from anywhere. An alternative market could allow bets to be made on research progress, putting people’s money on the line.

There’s an organization that could be built around the predictions - some fraction of the predictions could be sold to venture capitalists or fed into stock-market trading algorithms. Breakthroughs in machine translation or facial recognition have direct implications for the value of many companies. A broader technological forecasting could be built on the back of whatever infrastructure (codebase, UI, etc.) was built for this forecasting system.

Examples of similar projects: I’m surprised at how rare this is. There’s the wikipedia overview of Technology Forecasting, but it’s much higher level than what I have in mind.

Useful resources: Websites that aggregate benchmark performance offer strong lists of benchmarks to consider hosting forecasting predictions over (or can be scraped for past progress). The Brain projects list is a good place to start for ideas that could be evaluated for prediction.

The person to work with on this is Ray Kurzweil. He’s been making predictions about artificial intelligence for more than 2 decades now. Ray likely knows researchers who have made accurate predictions as well, as well as has experience in evaluating and structuring feedback. His predictions tend to be at the time scale of decades, though, and are high level enough that using past data wouldn’t quite work. That time scale means that evaluating researchers this way would be incredibly time consuming.

Tetlock is another natural choice, though there’s no guarantee that he’s open to collaboration right now. That said, if an important forecasting tournament or event is definitely happening, it may not be too difficult to get him on board. Could have him teach the forecasting course.

Specific Takeaways from Great Researchers

The behavior of great scientists is an amazing source of worthwhile experiment ideas.
1. It’s unclear what practices are causally connected to consistent breakthrough, and this experimentation would attempt to clarify those connections.
Surveying great scientists will lead to a list of the skills and techniques which can be identified (in discovering & hiring scientists) or trained.
The data from an overview will provide valuable source data for abstracting over scientists into scientific styles, a categorization scheme that can guide decision making.

On point 1: Systematically experiment (to the degree that it’s possible) with the biographical content of great researchers, we make an example out of Nikola Tesla. Nikola Tesla

His visualization method is "radically opposite" to the experimental. It's a new scientific method, where the hypotheses all happen in the mind, not on the world. It's a much more rapid form of development and perfection.
1. “... This I did constantly until I was about seventeen, when my thoughts turned seriously to invention. Then I observed to my delight that I could visualize with the greatest facility. I needed no models, drawings or experiments. I could picture them all as real in my mind. Thus I have been led unconsciously to evolve what I consider a new method of materializing inventive concepts and ideas, which is radically opposite to the purely experimental and is in my opinion ever so much more expeditious and efficient. {Tesla Autobiography, Pg. 12}
2. Einstein has something very similar.
Incredibly hard working.
1. “I had made up my mind to give my parents a surprise, and during the whole first year I regularly started my work at three o’clock in the morning and continued until eleven at night, no Sundays or holidays excepted. As most of my fellow-students took thinks easily, naturally enough I eclipsed all records. In the course of that year I past thru nine exams and the professors thought I deserved more than the highest qualifications. Armed with their flattering certificates, I went home for a short rest, expecting a triumph, and was mortified when my father made light of these hard won honors. That almost killed my ambition; but later, after he had died, I was pained to find a package of letters which the professors had written him to the effect that unless he took me away from the Institution I would be killed thru overwork.” {Tesla Autobiography, Pg. 37}
Imagination “affliction” during childhood.
1. “In my boyhood I suffered from a peculiar affliction due to the appearance of images, often accompanied by strong flashes of light, which marred the sight of real objects and interfered with my thoughts and action. … “When a word was spoken to me the image of the object it designated would present itself vividly to my vision and sometimes I was quite unable to distinguish weather what i was was tangible or not. This caused me great discomfort and anxiety. None of the students of psychology or physiology, whom I have consulted, could ever explain satisfactorily these phenomenons.” {Tesla Autobiography, Pg. 9}
Deep love of thinking.
1. “Every effort under compulsion demands a sacrifice of life-energy. I never paid such a price. On the contrary, I have thrived on my thoughts.” {Tesla Autobiography, Pg. 5}
Mother was an inventor of first order.
1. “She invented and constructed all kinds of tools and devices and wove the finest designs from thread which was spun by her. {Tesla Autobiography, Pg. 9}
2. “She worked indefatigably, from break of day till late at night, and most of the wearing apparel and furnishings of the home were the product of her hands.” {Tesla Autobiography, Pg. 9}
Tesla was a visualization MACHINE, traveling the world in his imagination, where the people he met and knew in his mind where just as real as those in real life.
Imagination based design
1. “My method is different. I do not rush into actual work. When I get an idea I start at once building it up in my imagination. I change the construction, make improvements and operate the device in my mind. It is absolutely immaterial to me whether I run my turbine in thought or test it in my shop. I even note if it is out of balance. There is no difference whatever, the results are the same. In this way I am able to rapidly develop and perfect a conception without touching anything. When I have gone so far as to embody in the invention every possible improvement I can think of and see no fault anywhere, I put into concrete form this final product of my brain. Invariably my device works as I conceived that it should, and the experiment comes out exactly as I planned it. In twenty years there has not been a single exception. Why should it be otherwise? Engineering, electrical and mechanical, is positive in results. There is scarcely a subject that cannot be mathematically treated and the effects calculated or the results determined beforehand from the available theoretical and practical data. The carrying out into practise of a crude idea as is being generally done is, I hold, nothing but a waste of energy, money and time.” {Tesla Autobiography, Pg. 12}
Reading Voraciously
1. “Of all things I liked books the best. My father had a large library and whenever I could manage I tried to satisfy my passion for reading. He did not permit it and would fly into a rage when he caught me in the act. He hid the candles when he found that I was reading in secret. He did not want me to spoil my eyes. But I obtained tallow, made the wicking and cast the sticks into tin forms, and every night I would bush the keyhole and the cracks and read, often till dawn, when all others slept and my mother started on her arduous daily task.” {Tesla Autobiography, Pg. 16}
Completion Obsession
1. “I had a veritable mania for finishing whatever I began, which often got me into difficulties. On one occasion I started to read the works of Voltaire when I learned, to my dismay, that there were close on one hundred large volumes in small print which that monster had written while drinking seventy-two cups of black coffee per diem. It had to be done, but when I laid aside the last book I was very glad, and said, “Never more!””{Tesla Autobiography, Pg. 38}
Book Memorization
“One afternoon, which is ever present in my recollection, I was enjoying a walk with my friend in the City Park and reciting poetry. At that age I knew entire books by heart, word for word. One of these was Goethe’s “Faust.” The sun was just setting and reminded me of the glorious passage:” {Tesla Autobiography, Pg. 42}

[a]Very interesting proposal. Kaggle would be very interested in collaborating to bootstrap something like this and potentially fund elements of this – all we'd really need is the right (set of) start problem. One or two influential research folks to host/evangelize it and kickstart the participation would go a long way too.

Source: Original Google Doc