INDEX
Explanations
instances of examples or illustrative cases in the text
New Auto-Interp
Negative Logits
unlaw
-0.71
ailability
-0.71
inventoryQuantity
-0.70
ements
-0.65
sil
-0.64
iership
-0.62
Deal
-0.61
unification
-0.58
fulfil
-0.58
iosity
-0.58
POSITIVE LOGITS
:#
0.90
suppose
0.82
,.
0.69
xon
0.69
:
0.65
subp
0.65
,
0.65
oths
0.64
imagine
0.63
Abrams
0.63
Activations Density 0.018%