INDEX
Explanations
occurrences of the word "said" in sentences
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1499
+0.09
0.3%
421
+0.08
0.2%
1068
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
713
+0.09
0.04
1068
+0.08
0.05
1895
+0.08
0.04
Negative Logits
anse
-0.79
solidar
-0.77
utop
-0.75
gero
-0.73
robus
-0.72
reger
-0.71
palet
-0.69
tenden
-0.67
ideolog
-0.67
/**
-0.65
POSITIVE LOGITS
they
0.56
said
0.56
głó
0.54
its
0.53
fréquent
0.50
unspeak
0.50
said
0.49
gaily
0.49
憾
0.48
it
0.48
Activations Density 0.182%