INDEX
Explanations
phrases related to praising or criticizing specific actions or behaviors
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1177
+0.11
0.3%
453
+0.11
0.3%
1499
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
247
+0.11
0.05
1056
+0.11
0.05
100
+0.09
0.04
Negative Logits
tricot
-0.79
suscep
-0.75
vespa
-0.74
cushi
-0.72
cabrio
-0.70
thermomix
-0.69
bordeaux
-0.68
teflon
-0.65
tdci
-0.62
lpg
-0.61
POSITIVE LOGITS
Manbalar
0.86
Minang
0.82
Banjar
0.81
Palembang
0.81
Karang
0.79
Lampung
0.78
Banten
0.69
Jambi
0.68
Pekan
0.68
Muhamma
0.68
Activations Density 0.283%