INDEX
Explanations
phrases indicating downward movement or negative progression
New Auto-Interp
Negative Logits
bian
-0.17
ione
-0.16
up
-0.16
aurant
-0.16
inness
-0.16
ж
-0.15
quette
-0.15
itag
-0.15
ynom
-0.14
kate
-0.14
POSITIVE LOGITS
graded
0.24
wards
0.23
/up
0.21
grades
0.20
played
0.19
/down
0.19
grading
0.19
stairs
0.19
WARDS
0.18
beat
0.18
Activations Density 0.056%