INDEX
Explanations
phrases related to direction or movement
New Auto-Interp
Negative Logits
ollar
-0.62
manship
-0.60
pots
-0.60
lett
-0.58
iqueness
-0.58
ãĤ¦ãĤ¹
-0.57
pot
-0.56
situational
-0.56
Ability
-0.56
Actual
-0.56
POSITIVE LOGITS
towards
0.99
toward
0.98
stairs
0.98
downhill
0.95
Ô
0.94
unnoticed
0.94
corridors
0.88
wards
0.86
blindly
0.85
unch
0.85
Activations Density 2.739%