INDEX
Explanations
phrases related to physical or metaphorical pathways or journeys
phrases indicating a journey or progression
New Auto-Interp
Negative Logits
ĪĴ
-0.67
reuse
-0.63
anton
-0.63
ervation
-0.61
onics
-0.60
imble
-0.60
ise
-0.59
igel
-0.59
»Ĵ
-0.58
Scene
-0.58
POSITIVE LOGITS
nir
0.81
forward
0.79
down
0.77
points
0.73
forth
0.69
ward
0.68
around
0.67
round
0.67
through
0.66
fare
0.65
Activations Density 0.029%