INDEX
Explanations
contradictory statements or stances
New Auto-Interp
Negative Logits
odan
-0.70
anonymity
-0.66
geons
-0.64
missions
-0.64
gee
-0.64
Examiner
-0.62
iger
-0.61
tor
-0.61
asio
-0.60
audience
-0.59
POSITIVE LOGITS
else
1.55
Else
1.13
resembling
1.04
Else
1.00
imaginable
0.92
pertaining
0.88
happening
0.87
happens
0.86
NESS
0.81
else
0.81
Activations Density 0.341%