INDEX
Explanations
the word "supposed" followed by a verb or noun, indicating expectations or intentions
phrases indicating expectations or societal norms
New Auto-Interp
Negative Logits
tex
-0.66
Ey
-0.59
Bohem
-0.58
Flavoring
-0.58
tein
-0.57
Blaz
-0.57
Splash
-0.56
sv
-0.54
Inventory
-0.54
lves
-0.54
POSITIVE LOGITS
ALLY
0.74
ILY
0.73
ered
0.68
erest
0.65
"$:/
0.64
ivalent
0.63
escription
0.63
ich
0.62
to
0.62
bene
0.61
Activations Density 0.043%