INDEX
Explanations
references to personal ownership or possessive language
New Auto-Interp
Negative Logits
edn
-0.17
Lip
-0.15
reven
-0.15
ATAL
-0.15
Platt
-0.14
orns
-0.14
Intent
-0.14
Ỽt
-0.13
asca
-0.13
iban
-0.13
POSITIVE LOGITS
age
0.18
ãĤº
0.17
ovol
0.15
estic
0.15
oslav
0.15
Jerseys
0.14
quarter
0.14
mono
0.14
åĽº
0.14
mage
0.14
Activations Density 0.035%