INDEX
    Explanations

    words related to art and its history

    New Auto-Interp
    Negative Logits
     A
    -0.27
    u
    -0.26
    ,
    -0.26
     w
    -0.25
     -
    -0.25
     and
    -0.25
    l
    -0.24
     f
    -0.24
     
    -0.24
     the
    -0.24
    POSITIVE LOGITS
    ож
    0.31
    ожд
    0.25
    еж
    0.25
    еÑī
    0.24
    ÑĢаÑī
    0.24
    еÑĩ
    0.24
    Ñij
    0.23
    нож
    0.22
    оÑī
    0.22
    ÑĥÑī
    0.22
    Act Density 0.025%

    No Known Activations