INDEX
Explanations
terms related to criminal activities and law enforcement
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1870
+0.15
0.6%
1416
+0.12
0.4%
1994
+0.11
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1994
+0.15
0.03
1425
+0.12
0.03
1416
+0.11
0.03
Negative Logits
Grüße
-0.55
suga
-0.51
Grath
-0.50
conges
-0.48
redhead
-0.47
motherfucker
-0.47
impotence
-0.47
throwaway
-0.46
AutoScaleMode
-0.45
pikachu
-0.45
POSITIVE LOGITS
criminal
1.24
Criminal
1.14
criminal
1.13
Criminal
1.11
CRIMINAL
0.94
kriminal
0.93
Crim
0.90
criminals
0.84
crimin
0.83
Crimin
0.75
Activations Density 0.060%