INDEX
Explanations
names or words with accents or special characters
the character "r" in various forms, especially within names and phrases
New Auto-Interp
Negative Logits
rooting
-0.88
iating
-0.68
ollah
-0.65
sanctuary
-0.65
oys
-0.62
****************
-0.62
owered
-0.62
dividing
-0.61
ously
-0.60
osite
-0.60
POSITIVE LOGITS
ré
1.12
ré
0.95
é
0.92
ité
0.89
nant
0.86
Ré
0.85
lé
0.84
Qué
0.82
miah
0.82
propos
0.82
Activations Density 0.007%