INDEX
Explanations
mentions of specific titles, particularly related to movies and shows
proper nouns and titles related to various forms of media and personal experiences
New Auto-Interp
Negative Logits
themselves
-0.86
Himself
-0.77
himself
-0.70
yourselves
-0.70
headquartered
-0.65
ributes
-0.64
Their
-0.63
prominently
-0.63
arnaev
-0.60
idates
-0.60
POSITIVE LOGITS
colleague
1.10
myself
0.96
husband
0.93
friends
0.93
roommate
0.92
friend
0.90
buddies
0.89
favorite
0.89
girlfriend
0.88
buddy
0.88
Activations Density 0.300%