INDEX
Explanations
words with accents and diacritics
the character "ç"
New Auto-Interp
Negative Logits
mileage
-0.62
************
-0.61
masturb
-0.60
Kimmel
-0.60
Mandela
-0.59
thumbs
-0.58
Loll
-0.58
consent
-0.58
counting
-0.57
Shed
-0.56
POSITIVE LOGITS
ão
1.23
ĩ
1.17
oÄŁ
1.11
İ
1.05
Ģ
1.02
oise
0.99
ais
0.97
ional
0.96
ĭ
0.95
į
0.92
Activations Density 0.030%