Flow-Based Policy for Online Reinforcement Learning