INDEX
Explanations
words that indicate frequency or repetition
New Auto-Interp
Negative Logits
otherwise
-0.15
âĢİ
-0.14
oro
-0.14
ÑĢÑĥп
-0.14
å¦ĤæŃ¤
-0.14
OTHERWISE
-0.14
ãģķãĤī
-0.14
izens
-0.13
velt
-0.13
obot
-0.13
POSITIVE LOGITS
mostly
0.32
mostly
0.30
during
0.29
when
0.29
sometimes
0.28
sometimes
0.28
Mostly
0.27
whenever
0.27
when
0.25
during
0.25
Activations Density 0.033%