INDEX
Explanations
references to user-related terms
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1222
+0.15
0.5%
1942
+0.14
0.5%
1984
+0.11
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1942
+0.15
0.04
1222
+0.14
0.03
966
+0.11
0.03
Negative Logits
Життєпис
-0.52
Біографія
-0.51
artney
-0.50
auguri
-0.49
prét
-0.49
corações
-0.48
namorados
-0.47
casais
-0.47
meninos
-0.47
ಊ
-0.46
POSITIVE LOGITS
users
1.11
Users
1.08
user
1.03
users
1.00
User
0.97
user
0.90
Users
0.86
USER
0.84
User
0.82
USER
0.81
Activations Density 0.037%