INDEX
Explanations
names of concepts and their explanations
New Auto-Interp
Negative Logits
confident
0.50
.
0.47
à
0.47
{0.47
für
0.45
$.
0.45
å
0.44
in
0.44
בין
0.44
(%)
0.43
POSITIVE LOGITS
ла
0.63
erar
0.52
WOULD
0.48
વરસ
0.47
দারি
0.47
COMPANIES
0.47
⊟
0.46
ERING
0.45
sabbam
0.44
бизне
0.44
Activations Density 0.000%