INDEX
Explanations
expressions describing experiences or sensations
New Auto-Interp
Negative Logits
ista
-0.17
igham
-0.15
tran
-0.15
iez
-0.15
až
-0.14
ients
-0.14
عار
-0.14
illa
-0.14
Bik
-0.14
utsch
-0.14
POSITIVE LOGITS
ãĤ¹ãĥĨãĤ£
0.17
nech
0.15
ogl
0.15
984
0.15
oby
0.14
dle
0.14
igue
0.14
353
0.14
mare
0.14
artz
0.13
Activations Density 0.034%