The goal of metaphonebr is to simplify brazilian names phonetically using a custom metaphoneBR algorithm that preserves ending vowels, created for aiding in dataset pairing in the absence of unambiguous keys.
The stable version of the package can be installed with:
install.packages("metaphonebr")
You can install the development version of metaphonebr from GitHub with :
# install.packages("remotes")
::install_github("ipeadata-lab/metaphonebr") remotes
This is a basic example which shows how to use the main function:
<- c("João da Silva", "Maria", "Marya",
example_names "Helena", "Elena", "Philippe", "Filipe", "Xavier", "Chavier")
<- metaphonebr::metaphonebr(example_names)
phonetic_codes print(data.frame(original = example_names, metaphonebr = phonetic_codes))
metaphoneBR
phonetic encoding algorithm proceeds as
follows:LH
is replaced by 1
(representing a
palatal lateral approximant, like in “Filha” -> “FI1A”).NH
is replaced by 3
(representing a
palatal nasal, like in “Manhã” -> “MA3A”).CH
is replaced by X
(representing the /ʃ/
sound, like in “Chico” -> “XICO”).SH
is replaced by X
(for foreign names
with /ʃ/ sound, like in “Shirley” -> “XIRLEY”).SCH
is replaced by X
(approximating /ʃ/ or
/sk/, like in “Schmidt” -> “XMIT”).PH
is replaced by F
(like in “Philip”
-> “FILIP”).SC
followed by E
or I
becomes
S
(like in “SCENA” -> “SENA”).SC
followed by A
, O
, or
U
becomes SK
(like in “ESCOVA” ->
“ESKOVA”).QU
or QÜ
followed by E
or
I
becomes K
(e.g., “QUEIJO” ->
“KEIJO”).GU
or GÜ
followed by E
or
I
becomes G
(the U
is silent,
e.g., “GUERRA” -> “GERRA”).QU
becomes K
(e.g., “QUANTO”
-> “KANTO”).Ç
is replaced by S
.C
followed by E
or I
is
replaced by S
(like in “CELSO” -> “SELSO”).C
(not part of an already transformed digraph
like CH or SC) is replaced by K
(like in “CARLOS” ->
“KARLOS”).G
followed by E
or I
is
replaced by J
(like in “GELO” -> “JELO”; GUE/GUI already
handled).Q
(that wasn’t part of QU) is replaced by
K
.W
is replaced by V
(common Brazilian
Portuguese pronunciation, e.g., “WALTER” -> “VALTER”).Y
is replaced by I
(e.g., “YARA” ->
“IARA”).Z
is replaced by S
(e.g., “ZEBRA” ->
“SEBRA”).X
preceded by S
has the X
removed (e.g., “EXCELENTE” -> “ESELENTE”, to avoid a double /s/
representation from SKS
).N
is replaced by M
(e.g.,
“JOAQUIN” -> “JOAQUIM”).AO
is replaced by OM
(e.g.,
“JOÃO” -> “JOOM”).ÃES
is replaced by AES
(e.g.,
“MÃES” -> “MAES”).1
for LH or 3
for NH) are
reduced to a single letter (e.g., “CARRO” might become “CARO”, “LESSA”
becomes “LESA”. Note: This rule simplifies sounds like ‘RR’ and ‘SS’ to
their single counterparts, which is a common Metaphone-style
simplification).The resulting code is an attempt to represent the phonetic signature of the name in a simplified, standardized way for a Brazilian Portuguese context. In particular, by construction it preserves ending vowels since they imply generally gender information in Brazilian Names (ex.: ADRIANO and ADRIANA).
metaphonebr is developed by a team of researchers at Instituto de Pesquisa Econômica Aplicada (Ipea).