INDEX
Explanations
terms related to misleading or deceptive information
terms related to deception or misleading information
New Auto-Interp
Negative Logits
riot
-0.78
hens
-0.76
oleon
-0.74
ateur
-0.73
mega
-0.72
mun
-0.72
foreseen
-0.71
âĶĢâĶĢâĶĢâĶĢâĶĢâĶĢâĶĢâĶĢ
-0.71
area
-0.68
aldo
-0.67
POSITIVE LOGITS
misleading
0.89
statements
0.85
disclosures
0.84
misrepresent
0.83
misled
0.83
ingly
0.80
deceive
0.79
omission
0.78
mislead
0.77
falsely
0.76
Activations Density 0.050%