INDEX
Explanations
phrases expressing fear or concern
modal verbs indicating possibilities or uncertainties
New Auto-Interp
Negative Logits
Dur
-0.72
Works
-0.72
raq
-0.68
Favorite
-0.67
Excellent
-0.67
AMA
-0.66
Excellent
-0.66
DAQ
-0.66
Awesome
-0.66
Rated
-0.64
POSITIVE LOGITS
contam
1.32
endanger
1.21
undermine
1.19
impede
1.19
interfere
1.19
violate
1.17
expose
1.16
injure
1.16
inadvertently
1.14
incite
1.14
Activations Density 0.138%