INDEX
Explanations
expressions of desire or intention toward actions and relationships
New Auto-Interp
Negative Logits
ondo
-0.16
McCorm
-0.15
endale
-0.14
PMC
-0.14
resher
-0.14
loi
-0.14
itan
-0.14
sik
-0.14
onto
-0.14
ntag
-0.14
POSITIVE LOGITS
therefore
0.20
accordingly
0.17
Therefore
0.15
Therefore
0.15
âĵĺ
0.14
distr
0.14
ãģłãģĭãĤī
0.14
Cam
0.14
donc
0.14
now
0.13
Activations Density 0.695%