INDEX
Explanations
names of specific locations or events
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
889
+0.18
1.0%
966
+0.14
0.8%
1145
+0.14
0.8%
Correlated Neurons
Index
P. Corr.
Cos Sim.
889
+0.18
0.04
1363
+0.14
0.03
501
+0.14
0.03
Negative Logits
<bos>
-1.06
República
-0.61
RTEE
-0.57
Carla
-0.56
Cat
-0.55
Categoria
-0.54
Categorie
-0.53
Bước
-0.53
autorytatywna
-0.53
bruge
-0.53
POSITIVE LOGITS
uncin
0.94
frankfurt
0.89
Honest
0.89
peppa
0.89
isabel
0.88
Honest
0.85
maneu
0.84
Gre
0.82
shenan
0.82
riviera
0.81
Activations Density 0.629%