INDEX
Explanations
references to situations or phenomena involving emotional or personal states
New Auto-Interp
Negative Logits
imd
-0.20
ätz
-0.17
ippet
-0.17
HING
-0.16
wig
-0.16
atetime
-0.16
herits
-0.14
ropdown
-0.14
kus
-0.14
mand
-0.14
POSITIVE LOGITS
Whale
0.17
ais
0.15
seau
0.15
ÑıÑĤелÑĮ
0.14
å£
0.14
abela
0.14
uzzi
0.13
ÑĩаÑģ
0.13
$body
0.13
ack
0.13
Activations Density 0.032%