INDEX
Explanations
statements emphasizing a particular viewpoint or argument
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
872
+0.10
0.3%
1719
+0.08
0.2%
470
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
454
+0.10
0.02
1355
+0.08
0.03
872
+0.07
0.04
Negative Logits
capulco
-0.82
kani
-0.81
susun
-0.78
magis
-0.77
ibiza
-0.76
guma
-0.75
haup
-0.74
tionally
-0.73
dirond
-0.72
canel
-0.71
POSITIVE LOGITS
point
1.00
point
0.86
Point
0.79
POINT
0.77
Point
0.74
POINT
0.73
points
0.67
punto
0.64
points
0.61
argument
0.60
Activations Density 0.218%