INDEX
Explanations
phrases related to fighting for a cause or right
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
596
+0.10
0.3%
1527
+0.09
0.3%
25
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
596
+0.10
0.04
549
+0.09
0.04
1892
+0.09
0.03
Negative Logits
conformidad
-0.56
worin
-0.52
Predecesor
-0.51
asmussen
-0.51
Cyfeiriadau
-0.47
awtextra
-0.47
ensement
-0.46
nexo
-0.46
uxley
-0.46
trás
-0.46
POSITIVE LOGITS
Henk
0.64
Vle
0.61
Dage
0.60
betterment
0.57
Pij
0.56
Sech
0.55
sake
0.54
Horacio
0.52
Bø
0.52
Katrin
0.52
Activations Density 0.210%