INDEX
Explanations
phrases indicating features, qualities, or items in a list
phrases that list or enumerate examples or factors
New Auto-Interp
Negative Logits
wan
-0.69
enary
-0.69
orem
-0.68
orse
-0.66
athing
-0.65
mit
-0.60
uers
-0.60
uni
-0.60
orship
-0.59
idates
-0.59
POSITIVE LOGITS
namely
0.89
Firstly
0.72
notably
0.63
xual
0.62
including
0.59
viz
0.57
etsk
0.56
redund
0.53
includ
0.53
weddings
0.53
Activations Density 0.266%