INDEX
Explanations
references to key properties or characteristics of various subjects
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
126
+0.17
0.9%
361
+0.12
0.7%
506
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
361
+0.17
0.05
287
+0.12
0.05
365
+0.11
0.04
Negative Logits
ivated
-1.68
word
-1.62
verdict
-1.50
acon
-1.47
respond
-1.47
stance
-1.46
ft
-1.43
aloud
-1.41
proceeding
-1.40
ivation
-1.36
POSITIVE LOGITS
à°¿
1.89
à±į
1.63
ĻĤ
1.59
cott
1.58
EGFP
1.58
indows
1.58
BET
1.52
DAMAGE
1.48
´
1.44
Redistributions
1.44
Activations Density 0.035%