INDEX
Explanations
references to magazine covers and cover stories
New Auto-Interp
Negative Logits
\\\\\\\\
-0.77
GOODMAN
-0.73
UTE
-0.70
Pillar
-0.70
NF
-0.68
Daniels
-0.66
Rowe
-0.63
_-
-0.62
Nare
-0.62
Pearce
-0.61
POSITIVE LOGITS
ilee
0.75
ayn
0.72
emate
0.72
hots
0.71
wit
0.71
racted
0.71
cand
0.70
actor
0.70
asser
0.69
iles
0.69
Activations Density 0.075%