Quantum computers are the next evolution of computing hardware. Quantum devices are being exposed through the same familiar cloud platforms used for classical computers, and enabling seamless execution of hybrid applications that combine quantum and classical components. Quantum devices vary in features, e.g., number of qubits, quantum volume, CLOPS, noise profile, queuing delays and resource cost. So, it may be useful to split hybrid workloads with either large quantum circuits or large number of quantum circuits, into smaller units. In this paper, we profile two workload splitting techniques on IBM's Quantum Cloud: (1) Circuit parallelization, to split one large circuit into multiple smaller ones, and (2) Data parallelization to split a large number of circuits run on one hardware to smaller batches of circuits run on different hardware. These can improve the utilization of heterogenous quantum hardware, but involve trade-offs. We evaluate these techniques on two key algorithmic classes: Variational Quantum Eigensolver (VQE) and Quantum Support Vector Machine (QSVM), and measure the impact on circuit execution times, pre- and post-processing overhead, and quality of the result relative to a baseline without parallelization. Results are obtained on real hardware and complemented by simulations. We see that (1) VQE with circuit cutting is ~39\% better in ground state estimation than the uncut version, and (2) QSVM that combines data parallelization with reduced feature set yields upto 3x improvement in quantum workload execution time and reduces quantum resource use by 3x, while providing comparable accuracy. Error mitigation can improve the accuracy by ~7\% and resource foot-print by ~4\% compared to the best case among the considered scenarios.