INDEX
Explanations
information related to personal preferences or favorites
the concept of personal favorites
New Auto-Interp
Negative Logits
ulative
-0.81
idem
-0.81
aping
-0.79
ural
-0.78
uid
-0.78
aton
-0.77
asse
-0.76
attle
-0.76
heed
-0.75
lam
-0.73
POSITIVE LOGITS
haun
0.94
Favorite
0.89
haunt
0.86
pokemon
0.83
spots
0.81
scenes
0.79
spot
0.78
hobbies
0.78
tricks
0.77
snack
0.77
Activations Density 0.048%