INDEX
Explanations
words related to emotional or surprising reactions
New Auto-Interp
Negative Logits
slips
-0.66
nect
-0.65
uben
-0.63
illusion
-0.63
omo
-0.63
itamin
-0.61
spring
-0.61
pleted
-0.61
forearm
-0.60
annexed
-0.60
POSITIVE LOGITS
me
1.02
onlook
1.01
us
0.97
ingly
0.96
him
0.83
commenters
0.81
audiences
0.78
investigators
0.78
reviewers
0.77
readers
0.76
Activations Density 0.110%