INDEX
Explanations
phrases related to expressing opinions or observations
the use of specific pronouns and demonstrative words
New Auto-Interp
Negative Logits
..."
-0.69
.","
-0.65
�
-0.59
\"
-0.59
\"
-0.55
.</
-0.54
...
-0.53
respectively
-0.51
``(
-0.50
â̦"
-0.49
POSITIVE LOGITS
odore
1.01
resa
0.95
xiety
0.83
notations
0.79
bidden
0.75
ibliography
0.75
foundland
0.72
swers
0.71
withstanding
0.71
tymology
0.71
Activations Density 0.803%