INDEX
Explanations
mentions of the word "Swift," particularly in reference to the Taylor Swift music or brand
New Auto-Interp
Negative Logits
voie
-0.15
grues
-0.15
šk
-0.14
PF
-0.14
Dennis
-0.14
/=
-0.14
ysa
-0.13
Truthy
-0.13
Tro
-0.13
ogie
-0.13
POSITIVE LOGITS
s
0.15
ened
0.15
filer
0.15
arna
0.14
à¤ķथ
0.14
ë§IJ
0.14
못
0.14
608
0.14
Ctrls
0.14
ext
0.14
Activations Density 0.002%