INDEX
Explanations
references to time and temporal expressions
New Auto-Interp
Negative Logits
æľĭ
-0.16
ãģĹãģı
-0.15
óln
-0.15
Initial
-0.15
stÅĻÃŃ
-0.14
Altern
-0.14
.initial
-0.14
lever
-0.13
лÑĥг
-0.13
-gnu
-0.13
POSITIVE LOGITS
gency
0.15
outu
0.15
nal
0.14
iped
0.13
iese
0.13
oval
0.13
938
0.13
802
0.13
iper
0.13
place
0.13
Activations Density 0.006%