INDEX
Explanations
positive adjectives and superlatives
expressions of personal preference or favorite things
New Auto-Interp
Negative Logits
oros
-0.60
hai
-0.58
icans
-0.56
etta
-0.55
elf
-0.55
revise
-0.55
ulent
-0.54
oris
-0.54
ãĤ¼
-0.53
resumes
-0.53
POSITIVE LOGITS
ones
0.91
standout
0.80
favorites
0.80
none
0.77
liest
0.76
singled
0.74
hardest
0.74
favourites
0.74
none
0.72
Probably
0.71
Activations Density 0.554%