INDEX
Explanations
pronouns referring to groups of people
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
381
+0.10
0.3%
1262
+0.10
0.3%
845
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1919
+0.10
0.07
331
+0.10
0.06
1510
+0.10
0.05
Negative Logits
monaster
-0.89
Singapur
-0.88
optik
-0.83
maksi
-0.83
Campionato
-0.81
poliuret
-0.81
parlamento
-0.80
antik
-0.79
meras
-0.78
lele
-0.75
POSITIVE LOGITS
shenan
0.91
miscon
0.82
FTFY
0.71
intersper
0.69
indestru
0.69
unspeak
0.67
Rgds
0.67
mustn
0.64
sophistic
0.63
disagre
0.63
Activations Density 0.374%