INDEX
Explanations
phrases indicating direction or movement
phrases indicating direction or movement toward a specific target or outcome
New Auto-Interp
Negative Logits
cu
-0.80
glers
-0.72
drivers
-0.66
̶
-0.66
eyes
-0.65
amp
-0.64
chin
-0.64
hai
-0.64
chu
-0.63
span
-0.63
POSITIVE LOGITS
adulthood
0.98
extinction
0.96
achieving
0.96
wards
0.95
WARD
0.88
completion
0.88
infinity
0.87
completing
0.86
solving
0.83
becoming
0.82
Activations Density 0.058%