INDEX
Explanations
words related to surprising or impressing actions or events
expressions related to feelings of surprise or impressiveness
New Auto-Interp
Negative Logits
glas
-0.76
concentrated
-0.65
sun
-0.65
raltar
-0.62
inatory
-0.62
denying
-0.61
uneven
-0.61
porary
-0.58
misplaced
-0.56
basis
-0.55
POSITIVE LOGITS
ingly
0.97
brate
0.85
brates
0.83
us
0.80
herself
0.79
yourselves
0.78
himself
0.76
him
0.75
ourselves
0.75
ively
0.74
Activations Density 0.400%