INDEX
Explanations
mentions of prominent individuals or celebrities, particularly in contexts related to social issues or challenges they face
New Auto-Interp
Negative Logits
upert
-0.07
ernet
-0.07
achten
-0.07
irk
-0.06
uning
-0.06
Wake
-0.06
á»Ń
-0.06
vid
-0.06
aucoup
-0.06
Hav
-0.06
POSITIVE LOGITS
-ves
0.07
elter
0.06
gerald
0.06
LK
0.06
icrous
0.06
ë¡ł
0.06
çģ
0.06
arian
0.06
Feinstein
0.06
ycle
0.05
Activations Density 0.039%