INDEX
Explanations
phrases indicating an inability to control a strong urge or desire
phrases indicating a sense of needing assistance or support
New Auto-Interp
Negative Logits
theless
-0.74
andom
-0.69
pool
-0.68
bread
-0.67
wood
-0.65
ires
-0.61
isk
-0.60
prints
-0.60
bon
-0.60
oven
-0.60
POSITIVE LOGITS
Desk
0.73
anybody
0.72
them
0.65
alleviate
0.63
ãĤ®
0.61
sth
0.60
anyone
0.59
financially
0.59
fully
0.59
anymore
0.58
Activations Density 0.033%