Last week DNASTAR’s Principal Scientist Steve Darnell participated in our very first Ask Me Anything, or AMA, on Reddit. An AMA is a crowdsourced interview where Reddit users can leave questions for the interviewee to answer and vote on other questions they would like to see answered. Steve answered over 50 questions, and topics ranged from career and academic advice to specific questions about the types of proteins he works with and the scripting languages he uses. You can find some of the most requested questions below or go the reddit site to find the full list of answered questions.
Question: When you first started your undergrad did you set out with the intention of studying structural bioinformation or did you kind of stumble into your career like many of us did?
Answer: I stumbled into this area like most others. I started thinking I would be a chemist. I added the biochemistry major late in college which took me into an extra year, so I tried a few CS courses. In grad school, but I didn’t get placed in a lab after rotations (we had a record class size). I was later placed with a first-year biochemistry professor with a cross appointment in math and a focus in structural biology… and the rest is history.
Question: Vmd, pymol, or Chimera? And which do you think is more powerful under the hood? Are there any new software hitting the ‘market’ that handle trajectories well? And how big of a system have you run through at all atom scale?
Answer: VMD was purpose built to view molecular dynamics trajectories. All three of your options have scripting languages which make them all pretty powerful. If you’re a python scripter like me, then PyMOL is the way to go. The plug-in system is nice for community contributions, but different plug-ins are easier to setup. Chimera has some wonderful visualization capabilities, including ambient occlusion in ChimeraX.
I am the leader of the development team behind Protean 3D, which is a 3D molecular visualization program combined with integrated bioinformatics and sequence analysis capabilities. It fills a slightly different need, but we’re continuing to move it forward.
I have some historical experience using MD, mostly running basic equilibrium simulations to sample structural diversity for other needs. We use different “simulation” techniques for predicting protein structures or docking proteins together, but the visualization tends to focus on the endpoints. Sorry I don’t have anymore insight into tools that have strong trajectory support. I’d be interested in hearing what you find out!
Question: I’m currently an undergrad trying to decide now whether I want to follow a path into biostat[istics] or working in a wet lab. What concerns me, though, is that I really don’t have a strong background in Python or R coding. Did you start coding before entering post secondary or did you develop the skills you use while in college?
Also, would you recommend any useful “hubs” of information that those in the field generally go for updates in the field other than sites like ScienceDaily etc or to discuss the state of bioinformatics in a non conference setting?
Answer: It’s great that you’re thinking ahead so early. So don’t worry about not having a strong coding background right now. I didn’t take a CS class until my last years of college (they taught me Java at the time). I dabbled for awhile until I ended up in grad school in a biology/mathematics lab. That’s where I really got better with C/Python/bash/etc. It’s much easier to get good fast when you’re immersed in the work every day.
As for updates, I have some subreddit subscriptions that I rely on or my Google News feed has become pretty good at knowing what I like scientifically. Mostly, I still rely on speaking with my friends and colleagues (even in a post-COVID world) for most of it. As for structural biology, you can’t beat the annual 3DSig satellite conference at ISCB.
Question: What’s the coolest DNA sequence you’ve designed and for what purpose? How does CRISPR fit into your position? Are there any ethics considerations?
Answer: My current position really focuses on enabling other researchers to design protein sequences through software. Back in the day, I worked on a proof-of-concept to improve the binding of a protein called SMAD4 to a protein called Ski, both which are part the TGF-beta pathway (involved in apoptosis). That was accomplished by making site-directed mutation to 1-3 positions.
Right now I’m devoting a lot of time to designing antibodies with the aid of computer modeling and experimental screening. CRISPR doesn’t fit into those plans, so I don’t have to worry about the ethics of gene editing for now.
Question: Do you use machine learning for this kind of work? If so, what specific algorithm types fit best?
Answer: Yes we do and the absolute best algorithms are… it depends. Random forests are still used a lot for classification and regression because of their general purpose and relatively robust nature. We use them in part for modeling antibody loops onto framework structures. Xgboost is supposed to work wonders on tabular data, and I have some work in mind for that.
Deep learning (Resnet variants) has been applied successfully to the area of protein structure prediction and there are growing examples of NLP being applied to creating a “grammar for protein sequences.” My current workstation was designed to let me start some proof of concepts in this area. I’m trying to carve out some time to join in the fun!