INDEX
Explanations
words related to deception or misrepresentation
terms related to misrepresentation and distortion of information
New Auto-Interp
Negative Logits
rises
-0.77
force
-0.72
achine
-0.70
¯¯¯¯
-0.68
ça
-0.66
spot
-0.65
hungry
-0.65
fi
-0.64
irez
-0.63
====
-0.63
POSITIVE LOGITS
misrepresent
0.86
inaccur
0.84
distortions
0.82
distort
0.80
distortion
0.80
inaccurate
0.78
falsely
0.76
distorted
0.74
omission
0.73
perceptions
0.71
Activations Density 0.062%