INDEX
Explanations
mentions of the word "America."
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.26
1.5%
1296
+0.17
1.0%
1565
+0.10
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1296
+0.26
0.05
1565
+0.17
0.03
78
+0.10
0.03
Negative Logits
<bos>
-3.15
ⓧ
-0.80
/***
-0.70
-0.70
/**
-0.69
defray
-0.65
endow
-0.65
mobilize
-0.64
waive
-0.63
disbur
-0.63
POSITIVE LOGITS
Châ
1.10
Juf
1.01
Traité
0.99
Schrö
0.98
Kün
0.95
America
0.95
bourg
0.93
Amé
0.92
Heere
0.91
Aner
0.91
Activations Density 0.038%