INDEX
Explanations
negations or negative constructs in the text
"Not" followed by an adjective
New Auto-Interp
Negative Logits
hatikan
-0.44
videre
-0.42
ändå
-0.41
tetap
-0.40
ферен
-0.40
both
-0.39
rsiniz
-0.39
それでも
-0.38
多人
-0.38
かで
-0.38
POSITIVE LOGITS
only
1.11
solely
1.11
merely
0.97
only
0.93
onely
0.90
Only
0.89
seulement
0.88
simply
0.86
Only
0.86
ONLY
0.86
Activations Density 0.355%