INDEX
Explanations
words related to distortion or distorted forms
terms related to misinformation and distortion
New Auto-Interp
Negative Logits
ept
-0.83
¯¯¯¯
-0.81
============
-0.77
cript
-0.74
edience
-0.71
fighters
-0.69
ailable
-0.69
alions
-0.69
APTER
-0.67
Flu
-0.66
POSITIVE LOGITS
distortions
0.96
distorted
0.96
distort
0.96
distortion
0.93
perceptions
0.75
adoes
0.74
havoc
0.70
usional
0.69
ibly
0.68
oscope
0.64
Activations Density 0.036%