INDEX
Explanations
references to celebrities
references to celebrities
New Auto-Interp
Negative Logits
Blocks
-0.75
Flow
-0.72
sis
-0.68
nav
-0.68
cture
-0.67
Mines
-0.67
ieves
-0.66
flow
-0.66
odes
-0.65
mol
-0.63
POSITIVE LOGITS
celebrity
3.39
celebrities
2.63
celeb
2.49
Celebrity
2.29
cele
1.95
Celeb
1.93
fame
1.57
cele
1.55
Cele
1.51
superstar
1.46
Activations Density 0.014%