INDEX
Explanations
mentions of the name "Joseph."
New Auto-Interp
Negative Logits
684
-0.16
quit
-0.16
ends
-0.16
ieri
-0.15
icher
-0.15
añ
-0.14
rique
-0.14
beg
-0.14
enta
-0.14
kie
-0.14
POSITIVE LOGITS
ine
0.39
INE
0.28
Stalin
0.20
son
0.20
ina
0.19
ines
0.19
thal
0.18
INES
0.18
ине
0.17
McCarthy
0.16
Activations Density 0.006%