INDEX
Explanations
references to specific characters or entities in the context of competitions or games
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
184
+0.20
0.6%
964
+0.17
0.6%
1343
+0.16
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
981
+0.20
0.03
184
+0.17
0.01
964
+0.16
0.02
Negative Logits
ьаж
-0.79
kasarigan
-0.62
RTSN
-0.61
Campionato
-0.58
Rumuni
-0.56
parlando
-0.56
OCCURRED
-0.55
Espèce
-0.55
abito
-0.52
gemeente
-0.52
POSITIVE LOGITS
InjectAttribute
0.45
>=",
0.42
houver
0.42
inkább
0.42
LMAO
0.41
Aiheesta
0.40
Lmao
0.40
Applicability
0.39
Jeez
0.39
PyTuple
0.39
Activations Density 0.088%