INDEX
Explanations
verbs indicating assistance or support
New Auto-Interp
Negative Logits
PreferredItem
-0.77
mã
-0.69
fä
-0.64
styleType
-0.64
Tann
-0.62
ALE
-0.62
SequentialGroup
-0.61
TAN
-0.59
cama
-0.59
Mow
-0.58
POSITIVE LOGITS
helps
1.85
Helps
1.73
Helps
1.68
helped
1.68
helps
1.66
Helped
1.59
helping
1.56
helped
1.53
Helping
1.36
Helping
1.34
Activations Density 0.146%