INDEX
Explanations
references to deceit and misinformation
Indicates something is not genuine or real
claims that are false
New Auto-Interp
Negative Logits
wapV
-0.42
Pill
-0.37
袱
-0.36
eneuve
-0.36
TextAlign
-0.35
windowFixed
-0.35
StreetMap
-0.34
vård
-0.34
собі
-0.33
OGND
-0.33
POSITIVE LOGITS
fake
0.91
faked
0.87
fake
0.84
phony
0.80
faking
0.79
Fake
0.75
fakes
0.74
Fake
0.73
FAKE
0.71
artificial
0.69
Activations Density 0.478%