Browse Reinforcemnet learning for Large Language Models