INDEX
Explanations
mentions of "kind of" or phrases indicating classification or types
New Auto-Interp
Negative Logits
chn
-0.16
alus
-0.15
ir
-0.14
cooldown
-0.14
ary
-0.13
esis
-0.13
Main
-0.13
Fle
-0.13
cogn
-0.13
As
-0.13
POSITIVE LOGITS
weise
0.17
kova
0.16
quot
0.15
ëģĶ
0.15
rome
0.15
tras
0.15
ofday
0.14
olson
0.14
zia
0.14
leftright
0.14
Activations Density 0.045%