INDEX
Explanations
instances of the word "ideally" indicating desired outcomes or best practices
New Auto-Interp
Negative Logits
lemen
-0.15
uent
-0.15
Jud
-0.15
617
-0.14
auen
-0.14
пож
-0.14
occan
-0.14
lien
-0.14
ấp
-0.14
Band
-0.13
POSITIVE LOGITS
اÙĦÙħÙĦ
0.17
luž
0.16
romo
0.15
unta
0.15
çIJ
0.15
umno
0.15
lon
0.15
822
0.14
.gdx
0.14
ÅĻi
0.14
Activations Density 0.004%