INDEX
Explanations
references to physical touch and emotions
New Auto-Interp
Negative Logits
hips
-0.16
iyat
-0.16
aires
-0.15
monds
-0.14
liga
-0.14
agar
-0.14
/people
-0.14
ега
-0.14
تÙģ
-0.14
abajo
-0.14
POSITIVE LOGITS
ings
0.17
aroo
0.17
followed
0.17
session
0.16
ero
0.16
ingly
0.16
/update
0.15
down
0.15
tings
0.15
ibr
0.15
Activations Density 0.174%