INDEX
Explanations
phrases related to arguments, debates, and comparisons
New Auto-Interp
Negative Logits
ciating
-0.80
opoly
-0.68
enjoyment
-0.62
heals
-0.62
Reply
-0.61
ilty
-0.60
iban
-0.58
wake
-0.56
inters
-0.55
disapprove
-0.55
POSITIVE LOGITS
resorted
0.89
recourse
0.80
devised
0.79
resort
0.75
teamed
0.74
opted
0.73
pmwiki
0.72
collaborated
0.70
enlisted
0.69
Firstly
0.69
Activations Density 0.234%