Detection of software bottlenecks which hinder utilizing hardware resources is a classic but complex problem due to the layered structures of the software bottlenecks. However, model-based approaches require a performance model given, which is impractical to maintain under today's agile development environment, and profile-based approaches do not handle the layered structures of the software bottlenecks. This paper proposes a novel approach of taking the best of both worlds which extracts a performance model from execution profiles of the target application to detect the layered bottlenecks. We collect a wake-up profile of threads, which samples an event that one thread wakes up another thread, and build a thread dependency graph to detect the layered bottlenecks. We implement our approach of profile-based detection of layered bottlenecks in the Go programming language. We demonstrate that our method can detect software bottlenecks limiting scalability and throughput of state-of-the-art middleware such as a web application server and a permissioned blockchain network, with small amount of the runtime overhead for profile collection.