INDEX
Explanations
the word "so" and its variations, indicating a focus on expressions of consequence or emphasis
New Auto-Interp
Negative Logits
ay
-0.16
uel
-0.16
oki
-0.15
ouse
-0.15
ur
-0.14
ello
-0.14
ousse
-0.14
upa
-0.14
reh
-0.14
_DO
-0.14
POSITIVE LOGITS
are
0.45
were
0.35
_are
0.30
бÑĥдÑĥÑĤ
0.27
ÑıвлÑıÑİÑĤÑģÑı
0.26
ÙĩستÙĨد
0.26
ARE
0.25
sÄħ
0.24
.are
0.24
Are
0.23
Activations Density 0.053%