INDEX
Explanations
references to file locations or downloadable documents
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1343
+0.20
0.6%
1445
+0.10
0.3%
1403
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1343
+0.20
0.04
1003
+0.10
0.02
771
+0.10
0.04
Negative Logits
disagre
-1.51
apprehen
-1.46
gaily
-1.43
unspeak
-1.39
snoopy
-1.37
excru
-1.31
unwarran
-1.31
tolerably
-1.28
jurassic
-1.28
outlander
-1.26
POSITIVE LOGITS
twimg
0.63
SourceChecksum
0.59
img
0.57
cache
0.54
file
0.54
cdn
0.54
img
0.53
***!
0.53
Ause
0.52
files
0.52
Activations Density 0.145%