INDEX
Explanations
phrases implying the truth or validity of a statement
phrases that indicate contrasts or negations of statements
New Auto-Interp
Negative Logits
ngth
-0.73
freezes
-0.70
umbnails
-0.69
alin
-0.63
clone
-0.61
Scores
-0.61
usercontent
-0.59
enaries
-0.59
ription
-0.58
hail
-0.56
POSITIVE LOGITS
true
1.46
true
1.40
happening
1.29
untrue
1.15
false
0.99
TRUE
0.99
happen
0.98
occurring
0.98
possible
0.93
achievable
0.92
Activations Density 0.162%