INDEX
Explanations
phrases mentioning the potential for negative consequences or harmful impacts
occurrences of the word "cause" and its variations, indicating potential consequences or effects
New Auto-Interp
Negative Logits
stra
-0.76
atu
-0.75
ian
-0.72
zai
-0.69
zb
-0.68
Rated
-0.68
iazep
-0.67
ta
-0.66
ature
-0.66
sonian
-0.66
POSITIVE LOGITS
havoc
1.30
headaches
1.03
mayhem
0.97
trouble
0.97
confusion
0.96
unnecessary
0.93
outbreaks
0.90
friction
0.90
panic
0.87
ãĥĨãĤ£
0.87
Activations Density 0.052%