INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    /usr
    -0.07
    -0.07
     Brook
    -0.07
     healthy
    -0.07
    .Rest
    -0.07
    Vol
    -0.07
    Language
    -0.07
     Wiley
    -0.06
     soup
    -0.06
    ге
    -0.06
    POSITIVE LOGITS
     Diamond
    0.20
     diamond
    0.16
    Diamond
    0.15
    diamond
    0.11
    iamond
    0.11
     diamonds
    0.10
     Diamonds
    0.09
    ,↵↵
    0.07
    atinum
    0.07
     Platinum
    0.07
    Act Density 0.003%

    No Known Activations