INDEX
Explanations
definite articles that signify emphasis or distinction
New Auto-Interp
Negative Logits
Very
-0.17
VERY
-0.16
very
-0.16
muy
-0.16
Basically
-0.15
hazi
-0.15
Very
-0.15
lẽ
-0.15
ania
-0.15
енка
-0.14
POSITIVE LOGITS
fault
0.25
anymore
0.25
necessarily
0.25
usual
0.21
sort
0.21
slightest
0.20
kind
0.20
nor
0.20
same
0.19
norm
0.19
Activations Density 0.055%