INDEX
Explanations
places or destinations
instances of the word "to"
New Auto-Interp
Negative Logits
selves
-0.91
etheless
-0.77
terday
-0.76
fortunately
-0.71
angered
-0.68
enance
-0.67
worthiness
-0.65
manship
-0.65
ection
-0.63
entit
-0.62
POSITIVE LOGITS
pless
1.08
extremes
1.07
jail
1.04
bed
1.03
sleep
1.00
lengths
0.93
prison
0.85
war
0.84
bat
0.84
hell
0.81
Activations Density 0.077%