INDEX
Explanations
negative sentiments or criticisms presented in sentences
hyper-responsiveness or substituted
New Auto-Interp
Negative Logits
ientras
-0.88
plufieurs
-0.87
للمعارف
-0.86
ſcher
-0.85
queſta
-0.82
ſont
-0.76
myſelf
-0.75
ſſel
-0.75
beſch
-0.74
ésultats
-0.73
POSITIVE LOGITS
and
0.40
·
0.38
or
0.38
、
0.37
•
0.35
–
0.34
of
0.33
↵
0.33
amongst
0.32
-,
0.32
Activations Density 0.185%