INDEX
Explanations
instances of misinformation or misleading statements in texts
New Auto-Interp
Negative Logits
BytesLike
-0.40
ISupport
-0.38
useAppContext
-0.37
JScrollPane
-0.37
XmlSchema
-0.37
isContained
-0.37
Cyfarwyddwr
-0.36
theoretical
-0.36
tearDown
-0.35
Географија
-0.35
POSITIVE LOGITS
fooled
0.63
betweenstory
0.60
confuse
0.59
mimicking
0.57
confund
0.57
principalColumn
0.57
deceived
0.57
disambiguazione
0.57
confusing
0.56
misled
0.55
Activations Density 0.494%