INDEX
Explanations
mentions of the name "Andre."
New Auto-Interp
Negative Logits
inct
-0.92
ulhu
-0.76
dfx
-0.72
ilial
-0.69
ead
-0.69
eled
-0.69
lished
-0.68
elled
-0.65
lishing
-0.64
plain
-0.64
POSITIVE LOGITS
tti
1.10
essen
0.96
byss
0.80
cats
0.80
Vu
0.77
XIII
0.73
Paste
0.71
Kov
0.70
aic
0.69
Rus
0.69
Activations Density 0.004%