INDEX
Explanations
references to deception or falsehoods, particularly concerning "fake" news and related concepts
New Auto-Interp
Negative Logits
'\\;'
-0.72
ſhall
-0.71
KommentareTeilen
-0.70
muſt
-0.68
findpost
-0.63
anſ
-0.63
pouvoit
-0.62
ölkerung
-0.61
اریخ
-0.60
>--}}
-0.60
POSITIVE LOGITS
fake
1.25
faking
1.18
faked
1.13
fakes
1.10
fake
1.07
Fake
1.06
Fake
1.03
mock
1.01
phony
1.00
pretending
0.96
Activations Density 2.747%