INDEX
Explanations
adjectives describing behaviors or characteristics of people
phrases indicating common behaviors or characteristics of people
New Auto-Interp
Negative Logits
bernatorial
-0.79
=[
-0.76
ospons
-0.75
everal
-0.69
Login
-0.69
Anniversary
-0.68
Scrib
-0.68
stantial
-0.66
Pastebin
-0.65
nesday
-0.65
POSITIVE LOGITS
afraid
1.25
tempted
1.13
obsessed
1.12
unwilling
1.10
reluctant
1.09
fools
1.08
unaware
1.07
fooled
1.07
fascinated
1.06
intimidated
1.06
Activations Density 0.238%