INDEX
Explanations
informal conversational phrases and expressions of agreement
New Auto-Interp
Negative Logits
which
-0.17
igli
-0.15
WOW
-0.14
бо
-0.14
DidLoad
-0.14
whose
-0.13
ewan
-0.13
isini
-0.13
IALOG
-0.13
or
-0.13
POSITIVE LOGITS
there
0.17
thems
0.16
they
0.16
ürn
0.15
we
0.15
they
0.15
tep
0.15
_this
0.14
there
0.14
we
0.14
Activations Density 0.178%