INDEX
Explanations
phrases related to perpetuating negative actions or behaviors
political or social critique, particularly regarding support for certain agendas or movements
New Auto-Interp
Negative Logits
ĸļ
-0.75
eline
-0.73
appreciated
-0.70
DragonMagazine
-0.67
survived
-0.66
tackle
-0.66
Clear
-0.65
imentary
-0.65
fax
-0.65
thanked
-0.65
POSITIVE LOGITS
false
1.16
harmful
1.16
inaccurate
1.15
misinformation
1.15
falsehood
1.13
misleading
1.12
unrealistic
1.11
irresponsible
1.11
distorted
1.09
incorrect
1.08
Activations Density 1.703%