INDEX
Explanations
references to controversial or unusual behavior and events involving animals
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
184
+0.16
0.5%
1533
+0.13
0.4%
163
+0.12
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
163
+0.16
0.04
184
+0.13
0.03
1533
+0.12
0.03
Negative Logits
notor
-0.99
mef
-0.95
permu
-0.93
gend
-0.92
anse
-0.89
utop
-0.88
hoj
-0.88
doman
-0.87
palab
-0.87
franz
-0.86
POSITIVE LOGITS
always
0.70
was
0.68
occasionally
0.67
depended
0.65
knew
0.65
never
0.65
thrived
0.65
wasn
0.64
kept
0.62
sometimes
0.61
Activations Density 0.723%