INDEX
Explanations
news-related information such as breaking news updates, statements from public figures, and detailed accounts of events
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
752
+0.12
0.4%
1177
+0.10
0.3%
394
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
752
+0.12
0.06
16
+0.10
0.08
897
+0.10
0.05
Negative Logits
.
-0.82
,
-0.75
;
-0.74
are
-0.72
/
-0.72
...
-0.71
、
-0.70
↵↵
-0.69
and
-0.69
。
-0.68
POSITIVE LOGITS
Mlle
1.57
emphat
1.56
vété
1.55
dovr
1.53
increa
1.52
affor
1.52
sappi
1.52
hentai
1.51
milf
1.51
unlaw
1.50
Activations Density 0.614%