INDEX
Explanations
references to individuals or groups of people
New Auto-Interp
Negative Logits
tnc
-0.88
srfAttach
-0.87
Accessory
-0.78
COMPLE
-0.72
NES
-0.70
uner
-0.69
efully
-0.68
iary
-0.64
Prov
-0.64
CCC
-0.64
POSITIVE LOGITS
smugglers
1.04
who
1.00
folk
0.95
underestimate
0.84
perceive
0.83
wanting
0.83
clam
0.82
else
0.81
prefer
0.78
cared
0.77
Activations Density 0.104%