INDEX
Explanations
negations or contradictions
negative phrases or negations
New Auto-Interp
Negative Logits
Advocate
-0.64
stakes
-0.63
Statements
-0.61
Liberties
-0.61
Exposure
-0.61
Surve
-0.60
Zeal
-0.60
ç·
-0.59
kamp
-0.59
Perspective
-0.59
POSITIVE LOGITS
icably
1.22
ifying
1.21
ching
1.21
ched
1.18
icing
1.15
epad
1.11
realizing
1.10
knowing
1.09
orious
1.02
bothering
1.00
Activations Density 0.096%