INDEX
Explanations
dates and time-related expressions
New Auto-Interp
Negative Logits
ắm
-0.16
Trou
-0.15
kabil
-0.15
85
-0.15
86
-0.15
Ïĥα
-0.15
osas
-0.14
92
-0.14
lech
-0.14
brtc
-0.14
POSITIVE LOGITS
194
0.79
Û±Û¹Û´
0.51
195
0.47
193
0.35
Û±Û¹Ûµ
0.31
WWII
0.24
104
0.24
196
0.22
Stalin
0.19
wartime
0.18
Activations Density 0.055%