INDEX
Explanations
phrases related to various topics like current events, history, politics, and law
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
577
+0.10
0.3%
198
+0.09
0.3%
1265
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
198
+0.10
0.05
240
+0.09
0.04
1183
+0.09
0.03
Negative Logits
alnız
-0.67
Meksiku
-0.66
<bos>
-0.63
spania
-0.56
ypeł
-0.56
témoignages
-0.56
Италијани
-0.55
splitlines
-0.55
SpringBootTest
-0.55
acakt
-0.54
POSITIVE LOGITS
toledo
0.81
mef
0.80
castro
0.78
fta
0.75
mallorca
0.75
psg
0.74
frankfurt
0.73
sml
0.72
dsg
0.71
aen
0.71
Activations Density 0.160%