INDEX
Explanations
expressions of personal feelings and progress
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.33
1.1%
478
+0.10
0.3%
1048
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1121
+0.33
0.06
1305
+0.10
0.05
1048
+0.09
0.02
Negative Logits
<bos>
-0.82
uncin
-0.67
ⓧ
-0.63
charité
-0.59
macrop
-0.59
suscep
-0.57
zirc
-0.55
lamella
-0.54
swarovski
-0.54
oleo
-0.53
POSITIVE LOGITS
progress
0.90
successes
0.78
improvements
0.77
improvement
0.77
hamdu
0.75
lanka
0.71
achievements
0.71
accomplishments
0.71
progress
0.68
positive
0.67
Activations Density 1.437%