INDEX
Explanations
sports-related words and team names
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
994
+0.14
0.5%
1224
+0.13
0.4%
1861
+0.11
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
994
+0.14
0.04
1861
+0.13
0.06
2033
+0.11
0.07
Negative Logits
سكانية
-0.76
-0.74
מבר
-0.73
تانيه
-0.72
للاسماء
-0.72
клопе
-0.70
OCCURRED
-0.70
Chwiliwch
-0.70
ัติ
-0.69
consultato
-0.68
POSITIVE LOGITS
disagre
1.53
increa
1.50
maneu
1.47
unspeak
1.45
encomp
1.44
inev
1.44
Ikr
1.43
Confu
1.43
ftu
1.42
reluct
1.41
Activations Density 1.311%