INDEX
    Explanations

    adjectives related to description and evaluation

    New Auto-Interp
    Negative Logits
     ÐŀлекÑģанд
    -0.19
     диÑĤини
    -0.18
     prime
    -0.18
     ÐĴолодими
    -0.17
     ÑĤела
    -0.16
    аном
    -0.16
     jednoho
    -0.16
     Statue
    -0.15
    ophile
    -0.15
    ÑģÑĤе
    -0.15
    POSITIVE LOGITS
     пÑĥнкÑĤ
    0.23
     ваÑĢианÑĤ
    0.22
     ÑħаÑĢакÑĤеÑĢ
    0.21
     виÑıв
    0.20
    иÑĤеÑĤ
    0.19
     Ñģлой
    0.19
     поÑĢÑıдок
    0.19
     ÑģпиÑģок
    0.19
     моменÑĤ
    0.19
    ÌĨ
    0.19
    Act Density 0.018%

    No Known Activations