My current research is focused on developing a better understanding of the areas in which machine learning and theoretical neuroscience overlap.   

What I mean is perhaps better phrased:  

How can one leverage machine learning in Neuroscience, and, of course, discover where machine learning can leverage Neuroscience knowledge.

Machine Learning and Theoretical Neuroscience

I am currently pursuing interest in the connections between Neuroscience and machine learning.

Obviously, machine learning can be applied to the massive data sets that come out of any discipline, it doesn't have to be Neuroscience. I'm considerably more interested in understanding how the brain works by using machine learning theory.

Some Questions/Ideas I am interested in:

  • How does memory work?
    • How can short term/working memory be so dynamic, while long-term memory seems fixed. ( yes, they are in different parts of the brain, but besides the obvious )
    • Does retrieval of memories reinforce or degrade synaptic weights in an expected way. What does this mean for learning algorithms?
  • How is the idea of an object encoded separately from the examples that are known?
  • Neuroscience says we have a seemingly unlimited capacity for long-term memory storage. If this is true there must be an efficient encoding and consolidation mechanism...
  • Gain modulation's effect on learning
  • Emotions affect memory by releasing chemicals in specific areas of the brain. Can this reinforcement be modeled in a constructive way?
  • Can we model the palimpsest property in deep networks and watch the degradation?
  • ...and many others :)


But, what am I doing right now?

I'm currently working in the area of computer vision and neuroscience, specifically the neuronal activation patterns of Boid snakes. One would like to do 3D tracking and motion capture of these snakes during strikes, but most of the motion capture systems on the market today use infrared technology and have a maximum frame rate of 200 Hz. In order to capture the strike of these snakes we would need frame rates as high as 700 or 800 Hz!

To tackle this problem, I'm creating a multiple camera system that allows me to do high speed motion capture in color. Currently the system is only useful for tracking of markers, but I'm working on a markerless approach. This movement information combined with EMG information from muscle activation will allow me to create a striking movement model for these snakes.

But why do we care about these snakes, you may ask? Well, this type of snake has some amazing abilities. It lives in trees and hunts birds and bats and other flying creatures. To do this, it wraps its tail around the tree, or tree branch, and cantilevers the rest of its body out into free space. With some snakes the amount of the body hanging in free space can be up to 70% of the body! Then the snake is able to stay motionless for hours until their prey flies by. At this point, they quickly spring into action and strike at the animal that is flying by. If you've ever tried to hold your arm up for a long time, you know that the arm isn't capable of doing any fast movement for a while afterwards. 

So how are these snakes able to do this? This is one of the questions we are trying to shed light onto. Other questions involve the vestibular system of the snake during strike as well as a multitude of interesting mathematical problems where I hope to use machine learning to good effect.

Previous Research

CNN-based Segmentation of Medical Imaging Data

Convolutional neural networks have been applied to a wide variety of computer vision tasks. Recent advances in semantic segmentation have enabled their application to medical image segmentation. While most CNNs use two-dimensional kernels, recent CNN-based publications on medical image segmentation featured three-dimensional kernels, allowing full access to the three-dimensional structure of medical images. Though closely related to semantic segmentation, medical image segmentation includes specific challenges that need to be addressed, such as the scarcity of labelled data, the high class imbalance found in the ground truth and the high memory demand of three-dimensional images. In this work, a CNN-based method with three-dimensional filters is demonstrated and applied to hand and brain MRI. Two modifications to an existing CNN architecture are discussed, along with methods on addressing the aforementioned challenges. While most of the existing literature on medical image segmentation focuses on soft tissue and the major organs, this work is validated on data both from the central nervous system as well as the bones of the hand. Link to Paper on arXiv

Approximate Bayesian Inference in Generative Topic Models

This was my Master's Research done at the University of Minnesota.

My research looked at equivalencies in probabilistic topic models. The most significant discovery was that methods of inference for one model are not necessarily possible for an equivalent model which is very structurally similar. In this case, approximate Bayesian inference could be applied to one model, but would not be tractable for an equivalent and structurally similar second model.

A topic model in this paper is a generative model for documents: it specifies a simple probabilistic procedure by which documents can be generated. To make a new document, one chooses a distribution over topics. Then, for each word in that document, one chooses a topic at random according to this distribution, and draws a word from that topic. Standard statistical techniques can be used to invert this process, inferring the set of topics that were responsible for generating a collection of documents. Although it is outside the scope of this work, these same methods can also be applied to knowledge sources such as images, genetic data, video recognition, or analyzing social networks. These topic models work on the assumption that documents contain a mixture of topics. Within each of these topics, certain words are more prevalent than others.

This paper looks at two algorithms, Gamma-Poisson(GaP) and Latent  Dirichlet Allocation (LDA), as well as a model known as Discrete Component Analysis (DCA) that attempts to create a general and equivalent form of both algorithms. The methods discussed in this paper are described in a graphical, as well as algorithmic, way to help showcase the similarities and differences in the algorithms. 

Reinforcement Learning of a Markov Adversarial Game through Stochastic Fictitious Play

This is a paper that started out as a group project for an Artificial Intelligence class. The co-authors and I pursued further development of the idea after the end of the class. 


Our work models a game in which police agents play mixed-strategies so to catch drivers that decide to go above the speed limit on a road. We are interested in understanding how agents use decision theory and game theory to make decisions on which roads to speed and which not to speed. Additionally, we are interested in knowing what kinds of police deployment are most useful in the context of certain geographies.

Our game is modeled as a graph. Driver agents seek to  plan a path from goal to destination and traverse the graph to get to destination. Police seek nodes that maximize their probability of .catching speeding drivers. The game is modeled as a Bayesian Stackelberg game. The leader (the police agent) commits to a strategy .first, and, given the police strategy, the follower(the driver agent) selfi.shly chooses, with a probability, the strategy that maximizes its profi.t. In turn, the leader may choose to play a follower Stackelberg's  strategies, so to catch the follower. We make use of two different learning models to simulate behavior, Opponent Modeling Q Learning and Experience Weighted Attraction (EWA).

Opponent Modeling Q Learning(OM) allows a player to take advantage of the less than optimal moves an opponent may make during a game. OM uses all of the same information as Minimax Q Learning, but also keeps track of how many times the opponent chooses in a certain action in each state. This extra information allows the player to overcome the problem from Minimax Q Learning of having an opponent agent that does not try each move from a state in.finitely often. 

Experience Weighted Attraction is a learning model that combines two seemingly disparate learning models, belief and choice reinforcement. In a belief learning model, a player keeps track of the history of the moves of other players and develops a belief of how the other players act. Then, given these beliefs, they choose a best-response which will maximize their expected payout.. In a choice reinforcement model, the strategy assumes that the previous payouts of chosen strategies a.ct how a strategy is currently chosen. Most of the time players don't have a belief about how other players will play; they only care about the payouts received in the past, not how the play evolved to yield those payout.s. These two learning models are treated as fundamentally different approaches, but EWA shows that they are related. EWA creates a model where both of these learning models are special cases. This allows EWA to learn from actions and experiences that are not directly reinforced by the choice of action in each step.