INDEX
Explanations
words related to historical events or societal structures
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1343
+0.14
0.4%
1438
+0.09
0.3%
1108
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1081
+0.14
0.07
459
+0.09
0.05
270
+0.09
0.05
Negative Logits
disagre
-1.84
reluct
-1.82
shenan
-1.76
increa
-1.75
depic
-1.75
impra
-1.72
encomp
-1.70
affor
-1.70
philanth
-1.70
milf
-1.70
POSITIVE LOGITS
<bos>
0.98
SequentialGroup
0.66
المعرف
0.65
depending
0.63
or
0.63
else
0.63
oder
0.63
etc
0.62
":[{0.59
หรือ
0.58
Activations Density 0.842%