INDEX
Explanations
references to specific individuals, particularly those named Antonio or related terms
New Auto-Interp
Negative Logits
æŀľ
-0.18
ewood
-0.17
çİĩ
-0.17
hetto
-0.16
illard
-0.16
füg
-0.16
ÑĩаÑĤ
-0.16
алеж
-0.16
tring
-0.15
arton
-0.15
POSITIVE LOGITS
icrobial
0.20
elope
0.20
ipation
0.18
eced
0.17
uario
0.17
uitive
0.16
ecess
0.16
iet
0.15
onyms
0.15
onio
0.15
Activations Density 0.041%