INDEX
Explanations
phrases indicating contribution or causation
New Auto-Interp
Negative Logits
introd
-0.58
remarqu
-0.56
[]>(
-0.54
>=",
-0.54
approached
-0.54
rolle
-0.54
cinated
-0.54
氓
-0.53
irin
-0.53
DEA
-0.52
POSITIVE LOGITS
Contributing
0.74
contributing
0.74
contributes
0.73
contribution
0.72
Contribute
0.71
contribute
0.71
contributors
0.71
fueling
0.70
contributor
0.70
contribué
0.67
Activations Density 0.279%