INDEX
Explanations
phrases related to declining to comment
mentions of refusal or declines to provide information
New Auto-Interp
Negative Logits
abiding
-0.92
soDeliveryDate
-0.65
ersed
-0.61
pires
-0.61
rotten
-0.60
oplan
-0.60
grass
-0.60
loving
-0.59
righteous
-0.58
RAW
-0.58
POSITIVE LOGITS
disclose
1.16
specify
1.14
discuss
1.12
speculate
1.10
comment
1.09
endorse
1.09
identify
1.06
quantify
1.06
clarify
1.02
characterize
1.00
Activations Density 0.065%