The reason we have RAG isn't because we have tiny memory windows. The reason we have RAG is because the LLM will never be trained on my private or personal data. It has no idea how to tell me how I should organize the shoes in my closet any more than it will know how to optimize for my robotics team's drive train for a field that I just put together. It won't have access to the tools in my garage or the contents of my refrigerator last week to give me an idea of what things could have made the dog sick.
RAG is an optimization for the absence of available data, not the absence of context window size. There will always be data that the LLM won't have available to it and until you can fit the entire universe of data across every instance of time into a context window - you will need a RAG approach to provide that data to the LLM. That's never going to change.
What you end up with when you shove absurd amounts of data into an LLM context window is literally forcing it to look for a needle in a haystack.