INDEX
Explanations
the presence of specific pronouns and verbs that indicate actions or expectations
New Auto-Interp
Negative Logits
explaining
-1.02
suggestion
-0.96
saying
-0.95
describing
-0.94
suggesting
-0.94
mentioning
-0.94
explains
-0.93
explanation
-0.92
stating
-0.92
explain
-0.91
POSITIVE LOGITS
divorced
0.54
disbanded
0.54
exits
0.53
AssemblyProduct
0.51
stoppage
0.51
retiring
0.51
parked
0.51
dropout
0.50
uninstall
0.50
uninstall
0.50
Activations Density 1.094%