INDEX
Explanations
sports-related references such as team names, games, and results
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1967
+0.11
0.3%
1253
+0.10
0.3%
856
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1915
+0.11
0.05
2043
+0.10
0.04
81
+0.08
0.03
Negative Logits
يتيمه
-0.55
lccccc
-0.52
onView
-0.51
Personendaten
-0.49
lcccccc
-0.48
Dziękuję
-0.47
rlrl
-0.47
awsze
-0.46
uwag
-0.46
proszę
-0.46
POSITIVE LOGITS
expéri
0.72
gouver
0.69
broder
0.66
prétend
0.64
catég
0.64
Keny
0.63
démoc
0.61
confé
0.61
Sén
0.59
génér
0.59
Activations Density 0.311%