INDEX
Explanations
mentions of historical figures and scholarly works
New Auto-Interp
Negative Logits
aison
-0.15
ÙĬاÙĨ
-0.15
Schmidt
-0.14
ekil
-0.14
ương
-0.14
suit
-0.13
ONO
-0.13
ylene
-0.13
roadcast
-0.13
osate
-0.13
POSITIVE LOGITS
et
0.21
writing
0.17
wrote
0.16
SND
0.16
argument
0.15
(ed
0.15
interviewed
0.15
ed
0.15
grounding
0.14
write
0.14
Activations Density 0.166%