INDEX
Explanations
statements expressing high confidence or likelihood
statements expressing certainty or likelihood
New Auto-Interp
Negative Logits
kind
-0.83
inth
-0.82
elle
-0.82
andan
-0.78
elight
-0.77
ollen
-0.72
aredevil
-0.67
uckland
-0.67
istry
-0.66
banks
-0.65
POSITIVE LOGITS
guessed
0.82
mistaken
0.77
assume
0.76
underest
0.73
guesses
0.70
untrue
0.68
assumed
0.68
infer
0.66
aspir
0.65
evapor
0.65
Activations Density 0.065%