INDEX
Explanations
expressions of enjoyment and positive experiences
New Auto-Interp
Negative Logits
850
-0.17
odes
-0.15
ÎŃν
-0.14
851
-0.14
514
-0.14
æ®
-0.14
366
-0.14
rientation
-0.14
indir
-0.14
ovan
-0.13
POSITIVE LOGITS
Tit
0.16
ındır
0.15
adi
0.15
ADI
0.14
TabIndex
0.14
chance
0.14
itas
0.14
agus
0.14
beaten
0.14
anki
0.14
Activations Density 0.215%