INDEX
Explanations
expressions of positive emotional responses and feedback
New Auto-Interp
Negative Logits
bias
-0.15
enson
-0.15
ãģĹãģı
-0.14
cone
-0.14
zag
-0.13
upply
-0.13
idis
-0.13
asive
-0.13
fcc
-0.13
á¿ĸ
-0.13
POSITIVE LOGITS
response
0.56
feedback
0.47
responses
0.46
response
0.43
reaction
0.41
Response
0.40
Response
0.38
reception
0.36
feedback
0.36
reactions
0.36
Activations Density 0.241%