INDEX
Explanations
instances of the phrase "to" indicating intention or action
New Auto-Interp
Negative Logits
holm
-0.17
elik
-0.17
raft
-0.15
jde
-0.15
.Observable
-0.15
riet
-0.14
achers
-0.14
kp
-0.14
oco
-0.14
omnia
-0.14
POSITIVE LOGITS
conclusion
0.24
conclusions
0.23
terms
0.17
_rat
0.16
Conclusion
0.16
attention
0.15
abrupt
0.15
consensus
0.15
Howe
0.14
grips
0.14
Activations Density 0.033%