INDEX
Negative Logits
.delta
-0.07
laughed
-0.06
WHICH
-0.06
_comparison
-0.06
literal
-0.06
_answers
-0.06
官
-0.06
Wifi
-0.06
twists
-0.06
Pearl
-0.06
POSITIVE LOGITS
-topic
0.07
carpet
0.07
ixa
0.07
Passive
0.06
xe
0.06
.getSource
0.06
_Context
0.06
Hack
0.06
unan
0.06
admin
0.06
Activations Density 0.012%