INDEX
Explanations
negative phrasing and contrastive statements
New Auto-Interp
Negative Logits
AsUp
-0.69
makeText
-0.58
DispatchToProps
-0.54
Бахар
-0.45
ниципа
-0.44
XMLSchema
-0.44
spli
-0.43
but
-0.42
లి
-0.42
]<<"
-0.41
POSITIVE LOGITS
merely
0.99
solely
0.97
tantum
0.91
alone
0.91
only
0.89
onely
0.85
wyłącznie
0.85
only
0.81
exclusively
0.80
alnız
0.78
Activations Density 0.285%