INDEX
Explanations
describing a state or action after "is/was/are/been"
New Auto-Interp
Negative Logits
гів
0.40
the
0.39
for
0.38
сов
0.37
图形
0.37
where
0.37
을
0.37
敨
0.37
licken
0.35
For
0.35
POSITIVE LOGITS
ovviamente
0.50
tellement
0.49
настолько
0.48
почти
0.47
ないので
0.47
했지만
0.47
relativement
0.46
dość
0.46
લગભગ
0.46
certes
0.45
Activations Density 0.097%