INDEX
Explanations
action verbs and phrases related to positive attitudes and behaviors
New Auto-Interp
Negative Logits
pora
-0.83
etheus
-0.75
successfully
-0.74
even
-0.70
rimination
-0.67
eatures
-0.66
likewise
-0.65
afort
-0.65
ornia
-0.64
amia
-0.64
POSITIVE LOGITS
ifiable
0.77
coincidence
0.74
Facts
0.72
darn
0.70
plain
0.69
shrug
0.68
dudes
0.66
fuck
0.65
shrugged
0.65
curiosity
0.65
Activations Density 0.282%