INDEX
Explanations
phrases related to power and authority
statements involving medical topics or health-related consequences
New Auto-Interp
Negative Logits
]."
-0.41
?).
-0.38
.'"
-0.37
again
-0.37
)."
-0.34
.�
-0.33
etc
-0.33
.''.
-0.32
..."
-0.32
likewise
-0.32
POSITIVE LOGITS
ãĤ¦ãĤ¹
0.34
soDeliveryDate
0.33
rina
0.32
pport
0.31
ãĤ½
0.31
ãĥ¢
0.31
yright
0.31
åī
0.31
æĸ¹
0.31
erker
0.30
Activations Density 3.512%