INDEX
Explanations
commonly used punctuation marks and symbols
punctuated phrases or clauses within sentences
New Auto-Interp
Negative Logits
ishly
-0.63
oret
-0.61
jah
-0.55
phal
-0.54
jured
-0.52
estone
-0.52
rik
-0.51
OND
-0.51
ingly
-0.50
eday
-0.50
POSITIVE LOGITS
respectively
1.00
constitutes
0.94
is
0.82
would
0.82
violates
0.81
underscores
0.81
udeb
0.80
seems
0.80
amounted
0.79
etc
0.79
Activations Density 0.460%