INDEX
Explanations
phrases related to direction or guidance
assertions about directionality, particularly regarding positive or negative trajectories
New Auto-Interp
Negative Logits
Sec
-0.62
ulas
-0.61
Splash
-0.60
çīĪ
-0.60
Availability
-0.58
arcity
-0.58
fame
-0.58
yrs
-0.57
Instit
-0.56
Spa
-0.56
POSITIVE LOGITS
direction
2.09
directions
1.83
footsteps
1.31
direction
1.22
Direction
1.18
opposite
1.11
wards
0.99
favor
0.95
WARD
0.93
ward
0.92
Activations Density 0.148%