INDEX
Explanations
mentions of research topics, including medical terms and themes related to cultural perspectives
"mention" or similar
mentioning the word
New Auto-Interp
Negative Logits
noqa
-0.49
marginHorizontal
-0.48
justified
-0.47
reasoned
-0.46
<bos>
-0.46
coledì
-0.45
justification
-0.45
dramatique
-0.45
unjustified
-0.43
Justification
-0.43
POSITIVE LOGITS
mention
1.00
mentions
0.87
Mention
0.85
mencionar
0.80
BoxDecoration
0.79
mentioning
0.78
menciona
0.77
+#+#
0.77
Mention
0.77
упомина
0.73
Activations Density 0.265%