About - APF - Anagram Pseudo Finder

Where do the data come from?

Everything is coming from Wikidata, the sister project of Wikipedia. All this data is freely available under the CC-0 license

Data has been harvested in ~~April 2020~~ October 2023 using SPARQL Queries (might need some improvement but works for this proof of concept!):

A first one to get all forenames that contain a property P1705 (native label): https://w.wiki/NCP
A second one for all forenames that have not been found using the previous query to get their labels in some languages when they exist: https://w.wiki/NCR
A third one to get all surnames and their native label : https://w.wiki/NCS

Are the strings used all real names?

Yes they are, at least that is how they are described on wikidata. We are not sure that the combination of the surname and forename exist somewhere in the world but all the results are forename and surnames.

How many names ares used to compute the anagrams?

The database contains 75,220 forenames and 509,736 surnames.

How many names have been anagramized?

7,068 names are currently cached in the database.

How does it work?

Combining each forename with each surname in the database would result in some billions of names (38,342,341,920 to be exact which is quite complicate to process, especially on a small server like this one. After thinking about it for some time, asking advice on StackOverflow, the solution I have found is the following one.

The database consists of one table for the forenames and one for the surnames. Each one contains one column for the name, one column for the length of the name and 26 columns, one for each letter, with an index on each of these columns. When you input a name on this website, a request is sent to the server asking for anagrams with a forename containing fours characters (forenames with less than 4 characters are excluded), the SQL query built looks like this (direct link to this query on github) :

When the result for 4 characters forenames is returned, a new query is sent for 5 characters and so on. By splitting the query into multiple requests we are able to get the final result in a reasonable time.

There is an issue, who can I contact?

On Mastodon symac@mamot.fr or using the email address you'll find on this page.