Data has been harvested in
April 2020 October 2023 using SPARQL Queries (might need some improvement but works for this proof of concept!):
Combining each forename with each surname in the database would result in some billions of names (38,342,341,920 to be exact which is quite complicate to process, especially on a small server like this one. After thinking about it for some time, asking advice on StackOverflow, the solution I have found is the following one.
The database consists of one table for the forenames and one for the surnames. Each one contains one column for the name, one column for the length of the name and 26 columns, one for each letter, with an index on each of these columns. When you input a name on this website, a request is sent to the server asking for anagrams with a forename containing fours characters (forenames with less than 4 characters are excluded), the SQL query built looks like this (direct link to this query on github) :
When the result for 4 characters forenames is returned, a new query is sent for 5 characters and so on. By splitting the query into multiple requests we are able to get the final result in a reasonable time.