INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ky
    0.50
    when
    0.50
    izzie
    0.48
    Ο
    0.46
    ppy
    0.45
     fleste
    0.44
     всех
    0.43
     संकेत
    0.43
    ociaż
    0.43
     Porque
    0.43
    POSITIVE LOGITS
     botan
    0.48
     glassware
    0.46
     beverages
    0.44
     함수
    0.43
     Beverages
    0.42
     artific
    0.40
    点を
    0.40
    <0xD0>
    0.39
    作った
    0.39
    ณิต
    0.39
    Act Density 0.004%

    No Known Activations