INDEX
Explanations
patterns of negation or rejection in a context
New Auto-Interp
Negative Logits
room
-0.17
ionic
-0.17
une
-0.17
alo
-0.16
iner
-0.16
avan
-0.15
agn
-0.15
robe
-0.15
ui
-0.15
obj
-0.15
POSITIVE LOGITS
Over
0.19
Part
0.18
Us
0.18
Æł
0.18
Event
0.17
Rel
0.17
Plus
0.17
Pre
0.17
aeda
0.16
idUser
0.16
Activations Density 0.110%