We developed the core components of the AI-aided multiple time stepping algorithm for multiscale modeling of cell dynamics. This algorithm was implemented and analyzed on two supercomputer architectures with an application of simulating the aggregation of 250 platelets, or 102 million particles. To scale on these computers with complex memory and network architectures with GPUs, we devised a biomechanics-informed task mapping scheme to optimize load imbalance, communications, and memory utilization. Our simulations, scaling well up to 192 nodes on a Summit-like supercomputer with a peak speed of 11 petaflops, achieved a rate of 423 μ s/day which is 500 times faster than the conventional algorithm using static time step and this has enabled studies of record size blood clots at record spatial–temporal resolutions. Additionally, we discovered the sensitive dependence of the scalability and execution time on the methods of decomposition, CPU–GPU coupling, and task mapping.