Sarah Eslami
👋 Hi there, I’m Sarah!
I am a PhD student at Hasso Plattner Institute in Potsdam, supervised by Prof. Gerard de Melo. I am also a scientific researcher at KISZ-BB project funded by the German Ministry for Education and Research.
I am interested in reseaching on multi-modal learning problems with a focus on the intersection of computer vision and natural language processing. We, humans, percept and interact with the world based on multi-modal input signals, e.g., by seeing, hearing, touching. Therefore, I believe that in order to achieve true intelligence in AI, we need to develop systems that can percept the world given the multiple complementory modalities, e.g., by learning visual representations via undertanding their corresponding explantory textual descriptions.
During my PhD, I have worked on efficiency of generative multimodal LLMs (MLLM) for high-resolution images as well as contrastive approaches for vision-language pre-training of foundation models. I am currently a research scientist intern at Jina AI focusing on production-ready APIs for efficient MLLMs.
Previously, I was a software engineer at Data4Life in Potsdam and a research intern at SAP Security Research in Karlsruhe. I obtained a M.Sc. in Computer Science from Saarland University, and I carried out my master’s thesis at Database and Information Systems Group at Max Planck Institute for Informatics advised by Prof. Asia Biega. Before my master’s, I obtained a B.Sc. in Computer Software Engineering from Amirkabir University of Technology in Tehran, and I carried out my bachelor’s thesis at the Computer Vision Group advised by Prof. Mohammad Rahmati.