INDEX
Explanations
phrases related to personal experiences and stories, including reflections, anecdotes, and personal opinions
New Auto-Interp
Negative Logits
Became
-0.74
iencies
-0.71
orks
-0.70
Ago
-0.69
hips
-0.67
renches
-0.66
storms
-0.65
reys
-0.64
css
-0.64
doms
-0.64
POSITIVE LOGITS
favourite
1.29
favorite
1.28
own
1.21
biggest
1.09
favorites
1.08
consolation
1.08
strongest
1.06
preferred
1.03
rightful
1.03
choice
1.02
Activations Density 9.456%