INDEX
Explanations
mentions or references to specific universities
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
227
+0.14
0.4%
1013
+0.12
0.3%
1741
+0.11
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
227
+0.14
0.04
69
+0.12
0.03
1424
+0.11
0.03
Negative Logits
pettico
-0.69
oubted
-0.67
disreg
-0.63
bookId
-0.61
turnips
-0.60
createDate
-0.60
mistak
-0.58
ristmas
-0.56
mauve
-0.56
'&#
-0.56
POSITIVE LOGITS
célé
0.52
silikon
0.51
University
0.51
lampa
0.51
distrik
0.51
radikal
0.49
ProtoMessage
0.47
kooper
0.47
konserv
0.47
zhong
0.47
Activations Density 0.090%