INDEX
Explanations
instances of dates and time-related phrases
New Auto-Interp
Negative Logits
WW
-0.16
WWII
-0.15
-grow
-0.14
Bek
-0.14
ulin
-0.14
.ov
-0.14
piar
-0.14
Û±Û¹Ûµ
-0.14
¬ģ
-0.14
chang
-0.13
POSITIVE LOGITS
200
0.74
201
0.65
199
0.64
Û²Û°Û°
0.43
198
0.39
Û±Û¹Û¹
0.36
Û²Û°Û±
0.35
Bush
0.34
202
0.33
bush
0.30
Activations Density 0.465%