INDEX
Explanations
capitalized acronyms related to organizations or titles
New Auto-Interp
Negative Logits
taboola
-0.66
Allaah
-0.64
rooms
-0.64
surpr
-0.63
speedy
-0.62
ogene
-0.62
clutter
-0.60
raise
-0.59
thinkable
-0.59
Spoiler
-0.58
POSITIVE LOGITS
FU
1.08
KA
1.07
KI
1.03
KT
1.00
HY
0.99
ZA
0.95
KO
0.95
HE
0.94
OT
0.93
JA
0.93
Activations Density 0.121%