INDEX
Explanations
situations involving debates or discussions surrounding societal issues or controversial topics
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1253
+0.15
0.5%
478
+0.13
0.4%
2034
+0.13
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1533
+0.15
0.06
610
+0.13
0.09
478
+0.13
0.07
Negative Logits
Derp
-0.95
Lma
-0.89
Cringe
-0.89
Yess
-0.87
Yeet
-0.81
Noice
-0.79
Fuckin
-0.79
Oof
-0.77
Whoo
-0.77
shenan
-0.76
POSITIVE LOGITS
paradiso
0.88
palio
0.85
torba
0.81
pendente
0.81
riva
0.81
bronzo
0.78
virtù
0.77
Settembre
0.76
sopr
0.76
Ottobre
0.76
Activations Density 0.400%