INDEX
Explanations
mentions of specific celebrities, particularly Beyoncé and Rihanna
New Auto-Interp
Negative Logits
ugu
-0.73
hemor
-0.60
positional
-0.60
igun
-0.59
Assembly
-0.58
ategory
-0.58
rink
-0.58
rongh
-0.58
ictionary
-0.57
deduction
-0.57
POSITIVE LOGITS
cé
1.51
Beyon
0.98
ce
0.98
gments
0.92
nect
0.91
issance
0.87
kees
0.85
bird
0.83
ciples
0.82
tics
0.82
Activations Density 0.003%