INDEX
Explanations
phrases related to allegations of misconduct and controversy
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
198
+0.09
0.2%
919
+0.09
0.2%
1842
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1422
+0.09
0.04
944
+0.09
0.03
972
+0.08
0.03
Negative Logits
fluo
-0.97
thermomix
-0.95
ampoule
-0.93
oleo
-0.81
clayey
-0.79
swarovski
-0.78
embodi
-0.78
fusible
-0.77
pandan
-0.75
nutella
-0.75
POSITIVE LOGITS
allegations
0.79
alleged
0.67
allegation
0.66
accusations
0.66
alleging
0.60
sexual
0.59
Muhamma
0.57
scandals
0.57
alleges
0.56
revelations
0.54
Activations Density 0.499%