INDEX
Explanations
phrases indicating a sense of immediacy or current events
New Auto-Interp
Negative Logits
unate
-0.18
oro
-0.15
OTHERWISE
-0.15
otherwise
-0.15
nze
-0.14
егоÑĢ
-0.14
ilon
-0.14
oras
-0.14
ãģĵãĤĵãģ«ãģ¡ãģ¯
-0.13
ants
-0.13
POSITIVE LOGITS
withstanding
0.23
adays
0.23
же
0.18
itz
0.18
here
0.17
fter
0.17
ä¹İ
0.15
HERE
0.15
UIP
0.14
aken
0.14
Activations Density 0.027%