INDEX
Explanations
references to historical figures and events related to colonization and imperialism
New Auto-Interp
Negative Logits
196
-0.20
197
-0.17
195
-0.17
USSR
-0.15
viron
-0.15
ught
-0.14
limburg
-0.14
vidéos
-0.14
Soviet
-0.14
Û±Û¹Û¶
-0.14
POSITIVE LOGITS
188
0.33
186
0.32
187
0.31
189
0.30
185
0.26
Victorian
0.26
183
0.24
184
0.24
191
0.23
190
0.22
Activations Density 0.780%