INDEX
Explanations
directional words indicating movement or orientation
New Auto-Interp
Negative Logits
usc
-0.17
ieber
-0.15
senal
-0.14
ÙĪÙĦÙĬ
-0.14
emachine
-0.14
oten
-0.14
æĪ¶
-0.14
mary
-0.13
uco
-0.13
azzi
-0.13
POSITIVE LOGITS
acades
0.18
Bay
0.16
roup
0.15
leigh
0.15
/inet
0.15
ap
0.15
arya
0.15
tra
0.15
wax
0.15
Bay
0.14
Activations Density 0.003%