INDEX
Explanations
phrases detailing comparisons or expressing a range of options and interpretations
New Auto-Interp
Negative Logits
bilis
-0.45
sez
-0.44
ophers
-0.43
ford
-0.41
none
-0.40
rans
-0.38
None
-0.38
bil
-0.37
eml
-0.37
maksud
-0.37
POSITIVE LOGITS
thing
3.59
things
2.88
THING
2.79
thing
2.79
Thing
2.55
things
2.49
Thing
2.45
Things
2.43
THINGS
2.38
Things
2.37
Activations Density 0.343%