INDEX
Explanations
references to historical figures and events
New Auto-Interp
Negative Logits
149
-0.18
157
-0.18
161
-0.17
Napoleon
-0.17
celik
-0.16
154
-0.16
155
-0.16
166
-0.15
162
-0.15
mdl
-0.15
POSITIVE LOGITS
Norman
0.29
Norm
0.29
norm
0.26
Norm
0.26
ноÑĢм
0.24
norm
0.22
norms
0.21
.norm
0.19
_norm
0.19
Counts
0.18
Activations Density 0.044%