INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     hospital
    -0.07
    (member
    -0.07
    .complete
    -0.07
     사람들이
    -0.06
     detained
    -0.06
     jackets
    -0.06
    ために
    -0.06
     prostředí
    -0.06
    Friends
    -0.06
     enraged
    -0.06
    POSITIVE LOGITS
    الح
    0.07
    Imp
    0.07
    зв
    0.07
    cura
    0.07
    _symbol
    0.07
    0.06
    τι
    0.06
    0.06
    BMW
    0.06
    0.06
    Act Density 0.027%

    No Known Activations