INDEX
Explanations
phrases indicating addition or stacking
phrases that emphasize hierarchical or sequential relationships
New Auto-Interp
Negative Logits
iment
-0.68
Cosponsors
-0.63
shorten
-0.59
ischer
-0.59
ANS
-0.58
more
-0.58
aren
-0.57
ern
-0.57
worst
-0.56
moderators
-0.56
POSITIVE LOGITS
paying
0.72
steroids
0.66
ĺħ
0.64
ours
0.64
rolet
0.62
Vulkan
0.62
hers
0.61
oxide
0.61
suspending
0.59
standing
0.59
Activations Density 0.057%