INDEX
Explanations
concepts related to classification and existence
New Auto-Interp
Negative Logits
дан
-0.18
ÑĪли
-0.16
bond
-0.15
باشÛĮد
-0.15
sollten
-0.15
amage
-0.14
ophobia
-0.14
utto
-0.14
илиÑģÑĮ
-0.14
strup
-0.14
POSITIVE LOGITS
uje
0.27
ÑĭваеÑĤ
0.23
uelve
0.22
аеÑĤ
0.21
ÑģÑĤвÑĥеÑĤ
0.21
ίζει
0.21
ÑĥÑĶ
0.20
ζει
0.19
ulates
0.18
иваеÑĤ
0.18
Activations Density 0.070%