INDEX
Explanations
negative sentiments or critiques
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
249
+0.18
1.1%
203
+0.15
0.8%
97
+0.14
0.8%
Correlated Neurons
Index
P. Corr.
Cos Sim.
249
+0.18
0.07
97
+0.15
0.07
203
+0.14
0.07
Negative Logits
-"
-1.78
atically
-1.62
orect
-1.57
OOGLE
-1.54
eful
-1.49
"./
-1.49
ributes
-1.47
ensitive
-1.45
"/
-1.45
umns
-1.42
POSITIVE LOGITS
¾
1.99
©
1.88
ī
1.84
·
1.80
ish
1.78
Ĥ
1.73
¯
1.71
®
1.69
heit
1.60
¶
1.57
Activations Density 0.218%