INDEX
Explanations
phrases related to effort and activity levels
New Auto-Interp
Negative Logits
upro
-0.15
byterian
-0.15
enders
-0.14
Kurul
-0.14
raf
-0.14
ufs
-0.14
weets
-0.14
iais
-0.14
iversit
-0.13
çĩ
-0.13
POSITIVE LOGITS
ocha
0.17
aday
0.17
cket
0.17
placer
0.16
orex
0.15
oney
0.15
estre
0.14
åĿĬ
0.14
leta
0.14
omba
0.14
Activations Density 0.084%