INDEX
Explanations
expressions related to political decisions and their potential consequences
conditional phrases indicating potential future outcomes or consequences
New Auto-Interp
Negative Logits
guesses
-0.76
noticed
-0.67
sang
-0.67
told
-0.67
confessed
-0.66
mastered
-0.65
Printed
-0.65
Reporting
-0.64
narrated
-0.64
Named
-0.64
POSITIVE LOGITS
undermine
1.47
endanger
1.41
exacerbate
1.38
worsen
1.38
impede
1.38
jeopard
1.34
hinder
1.34
weaken
1.34
violate
1.33
devast
1.31
Activations Density 0.206%