INDEX
Explanations
statements introducing examples
phrases that introduce examples or instances
New Auto-Interp
Negative Logits
ailability
-0.67
ements
-0.66
unlaw
-0.65
inventoryQuantity
-0.65
Deal
-0.64
fulfil
-0.60
sil
-0.60
iosity
-0.57
iership
-0.57
unification
-0.56
POSITIVE LOGITS
:#
0.86
suppose
0.86
,.
0.69
xon
0.66
imagine
0.65
oths
0.64
lihood
0.63
consider
0.63
,
0.62
:
0.61
Activations Density 0.026%