INDEX
Explanations
prompts asking for opinions, thoughts, influences, or experiences
questions that inquire about personal experiences or preferences
New Auto-Interp
Negative Logits
geoning
-0.80
alde
-0.77
het
-0.77
wald
-0.73
oval
-0.72
ammed
-0.69
overe
-0.69
aea
-0.68
avage
-0.68
osta
-0.68
POSITIVE LOGITS
favourite
1.23
favorite
1.17
Favorite
1.15
Favorite
1.12
hobbies
1.00
favourites
0.94
favorites
0.93
coolest
0.92
misconceptions
0.90
inspir
0.87
Activations Density 0.172%