INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Throughout
    -0.08
     lom
    -0.08
     Kart
    -0.08
     penggunaan
    -0.08
     semakin
    -0.08
     Jonas
    -0.07
    -0.07
    জন
    -0.07
     Lanz
    -0.07
    -virus
    -0.07
    POSITIVE LOGITS
     excerpts
    0.10
    excerpt
    0.08
     excerpt
    0.08
    hofer
    0.08
     filling
    0.08
    Draft
    0.08
     trecho
    0.07
     भर
    0.07
    Excerpt
    0.07
     mở
    0.07
    Act Density 0.025%

    No Known Activations