INDEX
Explanations
words related to changes or dynamics in a situation
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1253
+0.09
0.3%
1784
+0.07
0.2%
581
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1477
+0.09
0.04
1784
+0.07
0.03
1398
+0.07
0.01
Negative Logits
toledo
-0.91
mlb
-0.91
stockholm
-0.88
secon
-0.88
beverly
-0.88
madonna
-0.85
erad
-0.83
getty
-0.83
reluct
-0.81
fto
-0.80
POSITIVE LOGITS
nor
0.73
<bos>
0.68
whatsoever
0.55
underlying
0.54
oder
0.54
or
0.53
fundamentals
0.53
overall
0.53
resourceCulture
0.51
Cependant
0.50
Activations Density 0.214%