INDEX
Explanations
phrases questioning or exploring reasons for certain occurrences
expressions of curiosity or inquiries into reasons
New Auto-Interp
Negative Logits
aughed
-0.85
iece
-0.82
lator
-0.80
pione
-0.80
đ
-0.77
vertisement
-0.76
rawdownload
-0.76
Ă
-0.76
û
-0.76
ø
-0.76
POSITIVE LOGITS
soever
0.98
they
0.94
exactly
0.84
people
0.82
we
0.80
nobody
0.79
someone
0.79
somebody
0.76
there
0.75
anyone
0.73
Activations Density 0.049%