INDEX
Explanations
references to geopolitical events and international diplomacy
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
214
+0.13
0.5%
596
+0.13
0.5%
1637
+0.12
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
214
+0.13
0.05
596
+0.13
0.04
303
+0.12
0.04
Negative Logits
habang
-0.61
itong
-0.59
vī
-0.58
siyang
-0.57
kapag
-0.57
ējās
-0.54
vairāk
-0.54
bēr
-0.52
susun
-0.51
akong
-0.51
POSITIVE LOGITS
—
0.76
”—
0.66
ercice
0.62
—
0.59
!—
0.57
——
0.57
)—
0.57
toscana
0.57
.—
0.55
.”—
0.55
Activations Density 0.164%