INDEX
Explanations
phrases related to legal, criminal, and political topics
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
792
+0.09
0.3%
1385
+0.09
0.3%
1372
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
792
+0.09
0.05
1440
+0.09
0.05
1509
+0.08
0.04
Negative Logits
encomp
-0.92
disreg
-0.81
excru
-0.74
inev
-0.73
depic
-0.73
unve
-0.72
antem
-0.72
macrop
-0.71
suscep
-0.69
accla
-0.68
POSITIVE LOGITS
HasFactory
0.63
pertise
0.59
TextAppearance
0.57
KELEY
0.55
aspi
0.53
experimente
0.53
religione
0.53
tille
0.52
locu
0.52
visse
0.52
Activations Density 0.583%