INDEX
Explanations
mentions of legal and political terms and events
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1150
+0.14
0.4%
453
+0.11
0.3%
1842
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
946
+0.14
0.07
1835
+0.11
0.04
1150
+0.10
0.02
Negative Logits
alpes
-0.86
adal
-0.83
utop
-0.81
nera
-0.79
torba
-0.78
marte
-0.77
marmor
-0.76
polig
-0.76
sement
-0.76
ananas
-0.75
POSITIVE LOGITS
him
0.94
he
0.90
himself
0.75
his
0.72
acesta
0.71
shenan
0.71
He
0.70
she
0.69
éste
0.67
He
0.66
Activations Density 1.349%