INDEX
Explanations
phrases indicating criticism or nuanced evaluations of behavior
"At" followed by superlative adjectives
at least, at worst, at best
New Auto-Interp
Negative Logits
ako
-0.47
czy
-0.45
ÉM
-0.44
tung
-0.44
ak
-0.43
tyd
-0.42
FORME
-0.41
Kjelder
-0.41
udy
-0.40
iNdEx
-0.40
POSITIVE LOGITS
atleast
1.18
best
1.07
höch
1.03
almeno
1.01
best
0.99
макси
0.99
máximo
0.99
至少
0.98
worst
0.97
worst
0.96
Activations Density 0.210%