INDEX
Explanations
expressions of complexity and contradiction in arguments
New Auto-Interp
Negative Logits
iant
-0.14
.gameserver
-0.14
rosso
-0.14
ÌĨ
-0.14
ध
-0.14
ë
-0.14
imizer
-0.13
rint
-0.13
adow
-0.13
Invariant
-0.13
POSITIVE LOGITS
etter
0.15
.uf
0.14
-wise
0.14
Sutton
0.14
fair
0.13
whereas
0.13
Fairfield
0.13
awn
0.13
dar
0.12
elligence
0.12
Activations Density 0.385%