INDEX
Explanations
phrases related to political discussions
conjectures or predictions about reactions to situations
New Auto-Interp
Negative Logits
horizont
-0.63
collaps
-0.59
destro
-0.56
roma
-0.56
canopy
-0.54
maintenance
-0.53
branching
-0.53
upkeep
-0.53
phys
-0.52
foliage
-0.51
POSITIVE LOGITS
cynicism
0.72
disingen
0.71
rhet
0.68
rhetorical
0.67
hypocrisy
0.62
understatement
0.60
irony
0.58
Orwell
0.57
rael
0.56
hypocritical
0.56
Activations Density 1.274%