INDEX
Explanations
sexual content and misconduct, especially related to abuse and assault
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1870
+0.14
0.5%
1520
+0.11
0.4%
1218
+0.11
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1520
+0.14
0.03
1218
+0.11
0.03
325
+0.11
0.03
Negative Logits
JoinTable
-0.52
FormBorderStyle
-0.50
ErrorListener
-0.49
núm
-0.48
ExecuteAsync
-0.47
AccessorTable
-0.46
Roskov
-0.45
twimg
-0.44
Pogba
-0.44
placent
-0.44
POSITIVE LOGITS
sexual
1.14
Sexual
1.01
Sexual
1.00
sexual
0.98
sexually
0.85
pymysql
0.85
relenting
0.79
sightly
0.79
psycopg
0.77
sex
0.77
Activations Density 0.038%