INDEX
Explanations
instances of deceitful or false representations of events or actions
New Auto-Interp
Negative Logits
Wiktionnaire
-0.59
disambiguazione
-0.56
تفصیلات
-0.54
Ikus
-0.52
+#+
-0.51
debout
-0.51
Wilber
-0.50
igkeit
-0.50
ToFit
-0.49
::$_
-0.49
POSITIVE LOGITS
falsely
1.54
pretending
1.52
pretended
1.50
pretend
1.47
fake
1.46
pretends
1.43
false
1.31
Fake
1.28
illusion
1.25
fake
1.25
Activations Density 0.624%