INDEX
Explanations
adverbs that emphasize the degree or intensity of an action or attribute
strong adjectives indicating significant changes or increases
New Auto-Interp
Negative Logits
gres
-0.73
ividual
-0.71
acity
-0.69
rity
-0.69
ourke
-0.68
pty
-0.68
guyen
-0.67
ioch
-0.66
Farmers
-0.66
busters
-0.66
POSITIVE LOGITS
diver
0.83
tuned
0.82
sharply
0.78
downwards
0.77
cut
0.77
downward
0.76
distingu
0.73
curtail
0.72
distinguish
0.71
discontin
0.70
Activations Density 0.016%