What's the best way to store page hit from website?

Fordyce · February 21, 2020

I've been using php + mysql to store page hit to database.

But when it comes to querying daily/last week/last month hits, it quickly becomes slow as the data is growing in millions of rows.

Is there any tech stack recommended to do this?

For now I pull data from google analytics, but this is not a solution because I've got members, and they wrote articles, and need to know their article analytics.

Franck · February 21, 2020

Okay so you shouldn't update live the data. What you should do is have a local service that receive the query in service queue. A service queue cache the recent data and eventually write to long term data (be it database, text file, whatever fit your need). This reduce the load tremendously and can easily crunch the data and keep it up to date at a constant speed. If you query the result you get the long term data which the service queue adds what he hasn't compiled to.

the principle is very similar to how some second gen database works such as RavenDB or like MSMQ works. This is very similar on how Youtube viewcount works. There are many many server serving the same video and they each have their own count but they don't all write the count to one database every time. They save once in a while to the main data storage but for the rest they only share their number from time to time with each other which explain why when you open the same video that load form 2 different servers your count is close but different as they have probably not sync yet.

Fordyce · February 24, 2020

On 2/22/2020 at 12:14 AM, Franck said:

Okay so you shouldn't update live the data. What you should do is have a local service that receive the query in service queue. A service queue cache the recent data and eventually write to long term data (be it database, text file, whatever fit your need). This reduce the load tremendously and can easily crunch the data and keep it up to date at a constant speed. If you query the result you get the long term data which the service queue adds what he hasn't compiled to.

the principle is very similar to how some second gen database works such as RavenDB or like MSMQ works. This is very similar on how Youtube viewcount works. There are many many server serving the same video and they each have their own count but they don't all write the count to one database every time. They save once in a while to the main data storage but for the rest they only share their number from time to time with each other which explain why when you open the same video that load form 2 different servers your count is close but different as they have probably not sync yet.

I see, thanks for the input