INDEX
Explanations
questions about personal preferences and experiences
New Auto-Interp
Negative Logits
hel
-0.18
oste
-0.17
ple
-0.14
hel
-0.14
isphere
-0.14
inda
-0.14
gren
-0.14
Barth
-0.14
aks
-0.14
ple
-0.13
POSITIVE LOGITS
favourite
0.22
favorite
0.22
favorite
0.20
Describe
0.19
Describe
0.19
Favorite
0.17
Favorite
0.17
describe
0.17
describe
0.15
otto
0.14
Activations Density 0.039%