INDEX
Explanations
textual elements related to specific numeric or detailed classifications
New Auto-Interp
Negative Logits
виÑıв
-0.31
еÑģÑĤе
-0.25
огÑĢа
-0.25
заÑıв
-0.24
звиÑĩай
-0.21
ÑĢанÑĮ
-0.20
имÑĥ
-0.18
недел
-0.18
виÑıви
-0.17
поба
-0.16
POSITIVE LOGITS
Ñģклада
0.17
ãģ¡ãģ¯
0.17
çĶº
0.16
zab
0.15
yk
0.15
Thá»±c
0.14
oeff
0.14
aeda
0.14
otron
0.14
illac
0.14
Activations Density 0.009%