INDEX
Explanations
phrases emphasizing contrasting viewpoints or situations
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1150
+0.15
0.5%
1438
+0.12
0.4%
2034
+0.12
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1438
+0.15
0.04
200
+0.12
0.03
1781
+0.12
0.04
Negative Logits
Vegeu
-0.84
Glej
-0.78
Савезне
-0.71
lenker
-0.68
Parabéns
-0.67
Související
-0.66
Vaata
-0.66
Externé
-0.65
Наводи
-0.63
Preparación
-0.62
POSITIVE LOGITS
maneu
1.19
encomp
1.16
intermitt
1.11
indestru
1.11
reluct
1.06
accla
1.02
unve
1.00
disreg
1.00
downvotes
1.00
impra
0.99
Activations Density 0.177%