INDEX
Explanations
instances where someone takes authoritative action
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1984
+0.13
0.4%
687
+0.12
0.4%
1446
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
687
+0.13
0.06
1984
+0.12
0.06
331
+0.10
0.05
Negative Logits
gaily
-0.93
apprehen
-0.91
nobly
-0.90
vainly
-0.89
ineffec
-0.89
inconce
-0.87
unspeak
-0.84
tolerably
-0.81
disagre
-0.78
disgra
-0.77
POSITIVE LOGITS
WITH
0.69
WITH
0.64
pertise
0.63
with
0.61
sentito
0.58
sightly
0.58
gusto
0.57
soggior
0.56
skimage
0.56
With
0.56
Activations Density 0.218%