INDEX
Explanations
adverbs describing the degree or intensity of an action
expressions of surprise or amazement
New Auto-Interp
Negative Logits
iership
-0.78
angering
-0.74
icide
-0.72
needs
-0.70
uries
-0.70
visory
-0.68
hurts
-0.68
sucks
-0.66
orem
-0.65
igate
-0.64
POSITIVE LOGITS
similarities
0.89
juxtap
0.85
resemblance
0.80
similarity
0.79
strikingly
0.78
contrasting
0.76
curiously
0.72
reactions
0.72
mention
0.71
unexpected
0.70
Activations Density 0.323%