INDEX
Explanations
positive outcomes from inputs
New Auto-Interp
Negative Logits
rets
0.38
versing
0.37
占据
0.36
provoca
0.35
Differences
0.35
失去了
0.35
apshot
0.35
esc
0.35
introductions
0.35
discouraging
0.34
POSITIVE LOGITS
benefitting
2.00
benefiting
1.95
benefit
1.76
beneficio
1.66
benefited
1.62
benefit
1.61
benefitted
1.55
beneficia
1.53
benef
1.46
Benefit
1.43
Activations Density 0.012%