INDEX
    Explanations

    danger, crisis, or breach

    New Auto-Interp
    Negative Logits
    -
    0.60
    ז
    0.54
     inventories
    0.53
     earners
    0.52
    (
    0.51
    </i>
    0.51
     가지
    0.49
     Adu
    0.49
     invade
    0.49
    0.49
    POSITIVE LOGITS
     tiden
    0.63
     γεγον
    0.63
     Probleme
    0.61
    0.61
     सतत
    0.59
    ăț
    0.59
     podendo
    0.59
     problemen
    0.59
     problemas
    0.58
    ucapkan
    0.57
    Act Density 0.116%

    No Known Activations