INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    dır
    0.89
    0.87
    t
    0.78
    ians
    0.72
    mons
    0.67
    tions
    0.67
    tn
    0.66
    ın
    0.65
     наук
    0.64
    d
    0.64
    POSITIVE LOGITS
     knife
    0.78
    0.74
     I
    0.70
    ある
    0.66
    0.66
     Knife
    0.65
    0.65
    ك
    0.64
    "
    0.63
     knives
    0.63
    Act Density 0.001%

    No Known Activations