INDEX
Explanations
phrases related to denials or refusals
phrases indicating degrees of influence or impact
New Auto-Interp
Negative Logits
Knot
-0.74
Ashton
-0.72
incinn
-0.67
proverb
-0.65
staples
-0.63
eyebrows
-0.63
Corpus
-0.62
Cumber
-0.62
tales
-0.60
ŃĶ
-0.60
POSITIVE LOGITS
whatsoever
1.15
imaginable
0.90
soever
0.80
resembling
0.78
besides
0.77
isons
0.75
ames
0.74
brates
0.71
related
0.69
resembles
0.69
Activations Density 0.027%