Vectorizing the Trie: Efficient Constrained Decoding for LLM-based Generative Retrieval on Accelerators