The Conversational Tightrope: An Analysis of Large Language Model Performance Across Multi-Turn Dialogues