INDEX
Explanations
actions and events that indicate the beginning or realization of a situation
New Auto-Interp
Negative Logits
if
-0.21
when
-0.18
vi
-0.18
never
-0.18
as
-0.17
until
-0.17
both
-0.17
much
-0.17
quite
-0.16
Gh
-0.16
POSITIVE LOGITS
_______,
0.19
574
0.17
ï¼ĮåĪĻ
0.16
ÑģÑĤало
0.16
thì
0.15
maal
0.15
755
0.15
erotische
0.15
Ù쨥ÙĨ
0.15
ạn
0.14
Activations Density 0.209%