INDEX
Explanations
statements that question or critique methodological choices or interpretations in scientific studies
New Auto-Interp
Negative Logits
myſelf
-0.80
itſelf
-0.74
SPATH
-0.72
ELTS
-0.71
ſelf
-0.70
WebServlet
-0.70
internetowa
-0.70
ligiloj
-0.70
pleaſure
-0.70
Jefus
-0.68
POSITIVE LOGITS
...
0.54
...
0.52
↵
0.51
(
0.50
"
0.46
'
0.45
?
0.44
saying
0.44
[
0.43
5
0.43
Activations Density 0.066%