INDEX
Explanations
references to historical and influential figures and their contributions
New Auto-Interp
Negative Logits
varsa
-0.09
various
-0.08
GOODMAN
-0.08
inati
-0.08
Various
-0.08
íħIJ
-0.08
оÑĤли
-0.08
iddi
-0.07
celik
-0.07
éĥİ
-0.07
POSITIVE LOGITS
rather
0.12
instead
0.11
actually
0.10
none
0.09
neither
0.09
Rather
0.09
not
0.08
rather
0.08
entirely
0.08
something
0.08
Activations Density 0.045%