INDEX
Explanations
texts describing excellence or things related to excellence
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
478
+0.16
0.6%
994
+0.13
0.5%
197
+0.12
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
197
+0.16
0.03
1512
+0.13
0.03
1145
+0.12
0.03
Negative Logits
McLaugh
-0.65
indestru
-0.64
unwarran
-0.61
unspeak
-0.60
compréhen
-0.59
McF
-0.57
impelled
-0.56
inconce
-0.55
reclamar
-0.55
Vaugh
-0.54
POSITIVE LOGITS
excellent
0.84
cellent
0.80
excellent
0.79
Excellent
0.78
excellence
0.77
EXCELL
0.77
tille
0.75
Excellent
0.75
tyn
0.72
corpi
0.72
Activations Density 0.112%