INDEX
Explanations
URLs and technical information like paper titles and author names
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1343
+0.17
0.5%
1150
+0.11
0.3%
1978
+0.11
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1343
+0.17
0.05
924
+0.11
0.05
981
+0.11
0.04
Negative Logits
withal
-0.84
blest
-0.82
unspeak
-0.81
tupperware
-0.79
indescri
-0.77
ecru
-0.76
gaily
-0.75
hoody
-0.74
mistak
-0.73
McLaugh
-0.73
POSITIVE LOGITS
abbra
0.69
offerta
0.67
‹
0.66
espressione
0.65
rossi
0.62
stili
0.62
espres
0.61
obblig
0.61
uniti
0.60
ristor
0.60
Activations Density 0.198%