INDEX
Explanations
occurrences of the word "up."
New Auto-Interp
Negative Logits
ched
-0.17
üst
-0.16
ecess
-0.15
обÑĢаз
-0.15
aina
-0.15
Loren
-0.15
tet
-0.15
oral
-0.15
rej
-0.15
ambi
-0.14
POSITIVE LOGITS
ward
0.19
/down
0.18
wards
0.18
wards
0.17
yun
0.17
WARDS
0.16
ozilla
0.16
rightness
0.16
è¾¾
0.16
otre
0.15
Activations Density 0.023%