INDEX
Explanations
references to time-related phrases or markers
New Auto-Interp
Negative Logits
away
-0.17
аÑĢи
-0.15
chal
-0.14
tl
-0.14
sWith
-0.14
te
-0.14
nga
-0.14
ych
-0.14
ennes
-0.14
yn
-0.14
POSITIVE LOGITS
-than
0.33
_than
0.31
than
0.29
Than
0.26
THAN
0.24
than
0.24
_THAN
0.22
Than
0.21
niż
0.18
вÑģего
0.18
Activations Density 0.014%