SWAN: Preprocessing SGD Enables Adam-Level Performance On LLM Training With Significant Memory Reduction