INDEX
Explanations
instances of temporal phrases indicating time or duration
New Auto-Interp
Negative Logits
ingt
-0.22
немÑĥ
-0.17
ÑģÑİ
-0.15
Fullscreen
-0.15
ãģĬãĤĬ
-0.15
shot
-0.14
role
-0.14
ossible
-0.14
icking
-0.14
latin
-0.14
POSITIVE LOGITS
inception
0.26
then
0.23
none
0.23
childhood
0.23
nobody
0.21
nothing
0.21
they
0.20
forth
0.20
we
0.20
there
0.20
Activations Density 0.045%