INDEX
Explanations
phrases related to political agreements, negotiations, and policies
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
605
+0.12
0.4%
486
+0.12
0.4%
2034
+0.11
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
486
+0.12
0.04
2030
+0.12
0.04
874
+0.11
0.04
Negative Logits
casio
-1.68
fta
-1.57
blos
-1.56
squa
-1.54
aen
-1.52
mef
-1.52
fuj
-1.50
scrat
-1.48
seiz
-1.46
fto
-1.45
POSITIVE LOGITS
“
0.84
“
0.83
”
0.79
"
0.79
«
0.71
,“
0.71
}$
0.70
نسخة
0.69
”
0.69
»
0.69
Activations Density 0.165%