INDEX
Explanations
expressions indicating certainty or likelihood of events
phrases that express negation or disbelief regarding various situations or statements
New Auto-Interp
Negative Logits
kefeller
-0.74
Citiz
-0.65
escription
-0.64
İĭ
-0.60
DragonMagazine
-0.58
apons
-0.57
anian
-0.57
pione
-0.57
artney
-0.56
isin
-0.56
POSITIVE LOGITS
!
1.08
ðŁĻĤ
1.01
.
1.00
Nope
0.99
!!!!
0.98
!!!
0.97
!!
0.93
!.
0.93
*.
0.90
%.
0.89
Activations Density 0.204%