Today is one year since I started working seriously on pg_chameleon.
With this commit I changed the project's license to the 2 clause BSD and the project's scope, evolving the project into a MySQL to PostgreSQL replica system.
Initially this change was just a try. I needed to synchronise the data between MySQL and PostgreSQL and at that time the only option I had it was to use the MySQL foreign data wrapper, eventually to copy the data locally every now and then. However, because the previous implementation relied on a MySQL replica this approach approach wasn't really feasible.
If you are curious about the background story and how we scaled the analytics database in Transferwise you can read it here.
I developed pg_chameleon in my spare time. I like to think about it like my little commute project.
The first test on large datasets happened during the amazing days of the pgconf eu 2016. I remember how the process were incredibly slow, taking the unacceptable amount of time. Four days to copy a 600GB database. I found the bottlenecks during the nights between the conference days building a faster implementation.
I also had to cope with the sql dialect conversion. The solution is still in progress.
Initially I decided to use an existing library but after few failures I realised that sqlparse didn't fit my needs.
So I took the occasion to learn how to use the regular expressions and I doubled my problems at the same time.
In May I presented the project at the Estonian PostgreSQL User Group and the video is available here.
Currently the project is at the version 1.6 which improves the replay speed and comes with better status view with the replay lag along the read lag.
The upcoming release 1.7 will add an optional threaded mode for the replica, where the read and replay processes will run independently.
This version will also see the support for the type override during the init schema and the ddl replay. This change will make simpler to use pg_chameleon as a migration tool (e.g. conversion of tinyint(1) into a boolean).
However the current replay implementation can result in a broken in case of not compatible data pushed into the data fiels (e.g. insert a value >1 in tinyint(1) will throw a type error on postgres if the data is boolean). I'm working on a solution.
I've also started the development of the version 2 but I've not yet kicked off seriously the coding yet. The reason why is that I'm still learning a lot of things thanks to the feedback I'm getting via github.
I will start the version 2 soon and hopefully I will release the first alpha by the beginning of the next year.
However I'm very happy to see pg_chameleon gaining popularity.
If my work even will help just one person to move from MySQL to PostgreSQL, I feel satisfied.
So, happy birthday to pg_chameleon, my little pet project.
No comments:
Post a Comment