Sarah Eslami

👋 Hi there, I’m Sarah!

I am a PhD student at Hasso Plattner Institute in Potsdam, supervised by Prof. Gerard de Melo. I am also a scientific researcher at KISZ-BB project funded by the German Ministry for Education and Research.

I am interested in reseaching on multi-modal learning problems with a focus on the intersection of computer vision and natural language processing. We, humans, percept and interact with the world based on multi-modal input signals, e.g., by seeing, hearing, touching. Therefore, I believe that in order to achieve true intelligence in AI, we need to develop systems that can percept the world given the multiple complementory modalities, e.g., by learning visual representations via undertanding their corresponding explantory textual descriptions.

So far in my research, I have been mostly working on contrastive approaches for vision-language pre-training of foundation models. I am also generally interested in vision-language generative models, especially, diffusion-based approaches.

Previously, I was a software engineer at Data4Life in Potsdam and a research intern at SAP Security Research in Karlsruhe. I obtained a M.Sc. in Computer Science from Saarland University, and I carried out my master’s thesis at Database and Information Systems Group at Max Planck Institute for Informatics advised by Prof. Asia Biega. Before my master’s, I obtained a B.Sc. in Computer Software Engineering from Amirkabir University of Technology in Tehran, and I carried out my bachelor’s thesis at the Computer Vision Group advised by Prof. Mohammad Rahmati.