INDEX
Explanations
references to physical and sexual abuse or assault, particularly involving manipulation, threats, and force
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1978
+0.09
0.3%
1784
+0.07
0.2%
570
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1919
+0.09
0.07
972
+0.07
0.04
1675
+0.07
0.06
Negative Logits
olx
-1.23
haup
-1.20
lara
-1.18
lola
-1.18
mef
-1.17
fta
-1.16
ibiza
-1.16
sofia
-1.15
lamborghini
-1.12
lidl
-1.11
POSITIVE LOGITS
almaz
0.59
ErrIntOverflow
0.57
ALLENGE
0.56
figer
0.55
/**
0.55
الدولى
0.53
TagMode
0.53
@"/
0.52
LabelTagHelper
0.52
govine
0.51
Activations Density 0.568%