INDEX
Explanations
verbs expressing disagreement or challenge
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1108
+0.10
0.3%
369
+0.09
0.3%
872
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1601
+0.10
0.05
100
+0.09
0.04
1305
+0.09
0.05
Negative Logits
SizeMode
-0.59
Unies
-0.53
quibus
-0.50
zene
-0.49
Picchu
-0.48
lgica
-0.48
fosfor
-0.48
NRAS
-0.47
MLLoader
-0.47
etik
-0.46
POSITIVE LOGITS
shenan
1.20
disreg
1.12
unspeak
1.00
maneu
0.97
suscep
0.93
reluct
0.91
intersper
0.91
apprehen
0.90
horrend
0.86
sophistic
0.85
Activations Density 0.288%