Background: The advent of AI-driven models has revolutionized material discovery process through predictive as well as generative modeling. However, a majority of these models remain constrained in their utility due to their isolation in three aspects: (i) data modality used for training, (ii) specificity to certain material domains, and (iii) focus on independent application tasks. This approach curtails the sharing of knowledge across models, restricts access to diverse datasets, and fosters redundancy through the parallel development of analogous models globally. These constraints delineate the inefficiencies in recent AI development within the materials and chemistry sector. Objective: We introduce Foundation Model (FM) tailored for material discovery, seeking to surmount the aforementioned limitations. This FM employs a pre-training phase leveraging massive datasets across multiple modalities such as SMILES, property tables, and spectra, spanning diverse material domains (e.g., electronics, polymers, pharmaceuticals). By encoding generalized knowledge and representations onto its latent space, the FM serves as a versatile foundation for numerous downstream applications including predictive analysis and material generation. Methodology: To adeptly capture multi-modal representations, our FM utilizes a late-fusion scheme, which aligns representation vectors from distinct modalities into a shared latent space. This is achieved by pre-training modality-specific autoencoders (e.g., for SMILES) and subsequently aligning each modality’s individual latent spaces through contrastive learning on the shared latent space. We constructed three modality-specific models trained with SELFIES, DFT properties tables, and UV/Vis optical absorption spectra using self-supervised learning. Data was curated from public database such as PubChem and ZINC, in conjunction with DFT-simulated datasets, over 10 billion samples in total. The models were then integrated into a fusion model that projects the representations into a shared latent space. Results and Demonstration: Our FM exhibited exceeding performance across diverse downstream tasks, including material property predictions and generative applications utilizing pre-trained datasets. We will also showcase the FM’s efficacy in predicting properties on distinct datasets through fine-tuning external predictive models connected to the FM’s shared latent space. Additionally, we will unveil an immersive multi-modal conversational Graphical User Interface (GUI) for human interaction. The GUI integrates text-based interaction with a multi-modal input panel featuring a molecular editor, property table editor, and spectrum drawer. This innovative interface heralds new possibilities for human-model synergy within materials science and chemistry. Conclusion: This research introduces a transformative Foundation Model for material discovery, bridging modalities and domains. Through its versatile architecture and intuitive human-interface, it paves the way for more efficient and interactive material and chemistry research.