INDEX
Explanations
expressions of willingness to assist or provide support
New Auto-Interp
Negative Logits
اÙĦÙĬا
-0.17
iedo
-0.16
iye
-0.15
usta
-0.15
IRM
-0.15
lect
-0.15
gunakan
-0.14
pest
-0.14
allah
-0.14
еÑĢе
-0.14
POSITIVE LOGITS
happy
0.55
Happy
0.49
happy
0.48
Happy
0.46
happiness
0.43
HAPP
0.43
Happiness
0.35
happier
0.32
happ
0.31
happiest
0.30
Activations Density 0.077%