INDEX
Explanations
phrases identifying characteristics or features of something
phrases that state or assert something
New Auto-Interp
Negative Logits
itches
-0.86
actory
-0.73
ffe
-0.71
igraph
-0.67
ievers
-0.67
otte
-0.67
commit
-0.66
ced
-0.63
conn
-0.63
iscal
-0.63
POSITIVE LOGITS
admittedly
1.06
supposed
1.03
meant
0.99
basically
0.97
essentially
0.89
obviously
0.88
unlikely
0.86
certainly
0.86
probably
0.85
presumably
0.85
Activations Density 0.135%