INDEX
Explanations
references to significant public figures and their actions
New Auto-Interp
Negative Logits
ahun
-0.18
uong
-0.16
ea
-0.16
ÌĨ
-0.15
brook
-0.15
æŃ¯
-0.15
eum
-0.15
REA
-0.14
esiz
-0.14
ago
-0.14
POSITIVE LOGITS
,
0.17
Äģ
0.16
,↵
0.15
Åį
0.15
asal
0.15
â
0.15
fi
0.15
{{{0.15
iod
0.14
تÙĩ
0.14
Activations Density 0.467%