INDEX
Negative Logits
Cure
-0.09
watch
-0.08
Watch
-0.08
ાન
-0.08
watching
-0.08
lockers
-0.08
لاس
-0.07
teme
-0.07
Watch
-0.07
casser
-0.07
POSITIVE LOGITS
truthful
0.15
misleading
0.13
misinformation
0.13
truth
0.13
deceit
0.13
deceptive
0.13
deceive
0.12
truths
0.12
deception
0.11
fraudulent
0.11
Activations Density 0.084%