INDEX
Explanations
phrases related to consequences or actions taken in response to a negative event
contradictory statements or qualifiers related to claims and assertions
New Auto-Interp
Negative Logits
00007
-0.69
Emin
-0.67
Horus
-0.64
wik
-0.64
hiba
-0.63
ãĥĩãĤ£
-0.63
Printed
-0.62
Dhabi
-0.61
aid
-0.60
pes
-0.59
POSITIVE LOGITS
anymore
0.82
necessarily
0.80
dislike
0.72
disadvantages
0.71
disadvantage
0.68
merits
0.65
nor
0.65
iven
0.63
disapprove
0.62
backward
0.62
Activations Density 0.342%