INDEX
Explanations
linguistic patterns related to language translation or analysis
words or phrases related to cultural or social identity, particularly those with linguistic variation
New Auto-Interp
Negative Logits
nen
-0.84
Kardashian
-0.70
abad
-0.67
oway
-0.66
AAP
-0.64
Higgins
-0.62
ney
-0.60
bek
-0.60
sha
-0.60
pick
-0.59
POSITIVE LOGITS
issance
1.21
uthor
1.20
hedral
1.09
ñ
1.08
isance
1.06
ç
1.03
ire
1.03
BILITIES
1.03
ires
0.97
BILITY
0.95
Activations Density 0.121%