INDEX
Explanations
instances of words related to deception or falsehood
past tense verbs
New Auto-Interp
Negative Logits
DNA
-0.76
arta
-0.75
illin
-0.68
wcsstore
-0.67
Introduced
-0.67
soDeliveryDate
-0.67
metic
-0.64
OUNT
-0.63
ŃĶ
-0.63
thur
-0.62
POSITIVE LOGITS
ied
0.97
IED
0.83
CLASSIFIED
0.77
Franch
0.74
imentary
0.72
gments
0.70
Io
0.70
wic
0.70
ying
0.69
inois
0.68
Activations Density 0.009%