INDEX
Explanations
language related to fabricated or misleading information, possibly in the context of politics or media
discussions of manipulation or distortion of truth
New Auto-Interp
Negative Logits
DragonMagazine
-0.66
wana
-0.65
âĨij
-0.63
Originally
-0.62
adobe
-0.61
trak
-0.61
Aberdeen
-0.59
raltar
-0.58
ursday
-0.58
Warehouse
-0.58
POSITIVE LOGITS
predetermined
0.79
inevitably
0.72
meaningless
0.72
truths
0.71
insur
0.70
disson
0.69
trivial
0.69
downright
0.69
irresist
0.68
subconscious
0.68
Activations Density 1.139%