INDEX
Explanations
qualifications and contrasts
New Auto-Interp
Negative Logits
Result
0.21
Gün
0.20
이는
0.20
Gün
0.20
Ef
0.19
ushroom
0.19
Мо
0.19
이는
0.19
这是
0.19
G
0.19
POSITIVE LOGITS
albeit
0.26
但不
0.25
albeit
0.20
mainly
0.20
चाहें
0.19
특히
0.18
尤其
0.17
특히
0.17
mainly
0.17
soprattutto
0.17
Activations Density 0.894%