Snappy is a widely used (de) compression algorithm in many big data applications. Such a data compression technique has been proven to be successful to save storage space and to reduce the amount of data transmission from/to storage devices. In this paper, we present a fine-grained parallel Snappy decompressor on FPGAs running under a relaxed execution model that addresses the following main challenges in existing solutions. First, existing designs either can only process one token per cycle or can process multiple tokens per cycle with low area efficiency and/or low clock frequency. Second, the high read-After-write data dependency during decompression introduces stalls which pull down the throughput.