INDEX
Explanations
words related to providing evidence or support for a claim
terms related to substantiation and refutation in arguments
New Auto-Interp
Negative Logits
Mistress
-0.70
Drawn
-0.69
killer
-0.67
eries
-0.67
fork
-0.62
izabeth
-0.62
helm
-0.61
>>>>>>>>
-0.61
crow
-0.61
nesday
-0.61
POSITIVE LOGITS
acles
0.99
ivity
0.96
acle
0.93
oret
0.92
ctr
0.88
iveness
0.87
race
0.87
iating
0.87
raint
0.84
iation
0.84
Activations Density 0.016%