INDEX
Explanations
negative opinions or critiques, particularly related to political figures
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
539
+0.07
0.2%
1020
+0.07
0.2%
398
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1020
+0.07
0.05
16
+0.07
0.06
1293
+0.07
0.04
Negative Logits
maksi
-1.07
kosme
-1.01
alkoh
-1.00
kompati
-0.94
akut
-0.92
silikon
-0.92
radikal
-0.91
keramik
-0.90
optik
-0.90
Kategor
-0.86
POSITIVE LOGITS
serious
0.89
actual
0.85
actual
0.82
serious
0.81
genuine
0.80
real
0.75
ACTUAL
0.73
seriousness
0.72
Actual
0.69
genuinely
0.68
Activations Density 0.658%