INDEX
Explanations
quotations within text
instances of quotation marks in the text
New Auto-Interp
Negative Logits
tradem
-0.67
rall
-0.64
etheless
-0.63
mathemat
-0.63
abduct
-0.62
agre
-0.62
exting
-0.61
steroids
-0.59
outsourcing
-0.59
newcom
-0.59
POSITIVE LOGITS
soDeliveryDate
0.91
no
0.88
I
0.87
false
0.86
true
0.83
Hey
0.82
die
0.82
classic
0.81
don
0.81
nothing
0.81
Activations Density 0.147%