INDEX
Explanations
phrases indicating intensity or comparison
phrases emphasizing the concept of minimization or a lower bound
New Auto-Interp
Negative Logits
kefeller
-0.70
xs
-0.69
rows
-0.68
inal
-0.63
borg
-0.63
asms
-0.63
gaard
-0.61
dal
-0.59
cats
-0.59
stocks
-0.58
POSITIVE LOGITS
Gi
0.77
suffice
0.71
Lago
0.70
orah
0.68
FontSize
0.65
agogue
0.63
provocation
0.62
assume
0.62
ruck
0.60
taining
0.60
Activations Density 0.037%