INDEX
Explanations
phrases related to physical exertion or effort
New Auto-Interp
Negative Logits
psons
-0.70
lys
-0.67
atural
-0.63
obyl
-0.63
Recogn
-0.62
Faces
-0.62
Interstitial
-0.61
redes
-0.59
brance
-0.59
nam
-0.59
POSITIVE LOGITS
forward
1.08
chairs
1.00
toward
0.98
boundaries
0.93
back
0.93
towards
0.93
harder
0.92
aside
0.92
onward
0.90
ahead
0.86
Activations Density 0.507%