INDEX
Explanations
phrases indicating future progress or direction
phrases that indicate future actions or directions
New Auto-Interp
Negative Logits
ulin
-0.73
uminati
-0.70
eness
-0.68
ules
-0.67
oola
-0.66
ises
-0.65
trak
-0.62
uum
-0.61
odor
-0.59
oup
-0.58
POSITIVE LOGITS
forward
1.32
forward
1.27
into
1.20
forwards
1.13
Into
1.01
Forward
0.98
into
0.97
INTO
0.96
onward
0.95
onwards
0.91
Activations Density 0.079%