INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    0.90
    0.85
    0.79
    G
    0.77
    よい
    0.74
    I
    0.74
    ли
    0.73
    0.73
    0.71
    극장
    0.67
    POSITIVE LOGITS
     fruit
    0.77
    0.71
     Fruit
    0.71
    kaart
    0.69
     T
    0.65
    ilege
    0.64
    jete
    0.64
    ่า
    0.63
     malice
    0.63
     briefs
    0.63
    Act Density 0.011%

    No Known Activations