INDEX
Explanations
political terms and concepts
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1331
+0.09
0.3%
1533
+0.08
0.3%
1335
+0.08
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1129
+0.09
0.03
1101
+0.08
0.02
353
+0.08
0.03
Negative Logits
prouve
-0.61
reconnaît
-0.59
nomme
-0.55
mène
-0.54
paraît
-0.54
désigne
-0.53
occupe
-0.53
étu
-0.52
reçoit
-0.52
soutient
-0.52
POSITIVE LOGITS
embodi
0.67
veiks
0.65
provato
0.64
sentito
0.60
thut
0.59
facciamo
0.58
fays
0.58
tew
0.56
espri
0.56
yaa
0.55
Activations Density 0.140%