INDEX
Explanations
actions or decisions taken instead of others
gerunds and phrases suggesting actions or processes
New Auto-Interp
Negative Logits
oké
-0.62
congratulated
-0.62
Nanto
-0.62
soon
-0.60
ista
-0.59
anian
-0.59
stad
-0.58
ASA
-0.57
âĹ¼
-0.56
kindred
-0.55
POSITIVE LOGITS
altogether
0.91
anymore
0.90
anything
0.79
outright
0.78
ĨĴ
0.71
necessarily
0.70
nor
0.69
traditional
0.67
actual
0.66
versa
0.65
Activations Density 0.188%