INDEX
Explanations
expressions of concern or caring about others
New Auto-Interp
Negative Logits
ald
-0.17
dum
-0.17
vider
-0.15
edia
-0.15
ovie
-0.15
als
-0.15
ousse
-0.14
eping
-0.14
allest
-0.14
awn
-0.14
POSITIVE LOGITS
about
0.40
tentang
0.28
about
0.27
About
0.26
ABOUT
0.25
_about
0.25
åħ³äºİ
0.24
About
0.23
-about
0.21
whether
0.19
Activations Density 0.017%