INDEX
    Explanations

    words related to colors or visual descriptors

    New Auto-Interp
    Negative Logits
    glers
    -0.74
    earchers
    -0.74
    hiba
    -0.73
     Starr
    -0.72
    undai
    -0.72
    perty
    -0.70
    cemic
    -0.69
    pering
    -0.69
    xon
    -0.68
    Preview
    -0.67
    POSITIVE LOGITS
    е
    1.52
    а
    1.52
    о
    1.50
    и
    1.43
    оÐ
    1.39
    Ñĭ
    1.37
    Ñĥ
    1.30
    л
    1.23
    Ñı
    1.22
    ÑĢ
    1.21
    Act Density 0.005%

    No Known Activations