I was experimenting with Python for some personal data-based project. My goal was to play around with some statistics functions, maybe train some models and so on. Unfortunately most of my data needs a lot of preprocessing before it will be of any use to numpy&co. Mostly just picking out the right objects from a stream of compressed JSON data and turning them into time series. All of that is easy to do in Python, which is what I'd expect for the language used this much in data science (or so I've heard).
But then I do have a lot of data, there's a lot of trial&error there for me so I'm not doing these tasks only once, and I would have appreciated a speedup of up to 16x. I don't know about "high performance", but that's the difference between a short coffee break and going to lunch.
And if I was a working on an actual data-oriented workstation, I would be using only one of possibly 100+ cores.
But then I do have a lot of data, there's a lot of trial&error there for me so I'm not doing these tasks only once, and I would have appreciated a speedup of up to 16x. I don't know about "high performance", but that's the difference between a short coffee break and going to lunch.
And if I was a working on an actual data-oriented workstation, I would be using only one of possibly 100+ cores.
That just seems silly to me.