INDEX
Explanations
expressions related to progress or advancement over time
references to the concept of a journey or process
New Auto-Interp
Negative Logits
itton
-0.82
oute
-0.73
incinn
-0.72
encer
-0.70
iple
-0.68
concess
-0.68
avorite
-0.67
oggles
-0.67
wat
-0.66
mare
-0.64
POSITIVE LOGITS
ward
0.89
steps
0.80
WARD
0.75
finding
0.75
fare
0.69
seeing
0.63
aries
0.61
stones
0.61
point
0.61
Step
0.60
Activations Density 0.013%