INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    starts
    -0.07
     Experienced
    -0.07
     kinda
    -0.06
     criter
    -0.06
     mereka
    -0.06
     expres
    -0.06
     sebuah
    -0.06
    -0.06
     expose
    -0.06
     Rever
    -0.06
    POSITIVE LOGITS
    THE
    0.07
    /proto
    0.07
     captive
    0.06
     Victoria
    0.06
    IZER
    0.06
     justo
    0.06
    imize
    0.06
    0.06
     Jehovah
    0.06
    /Test
    0.06
    Act Density 0.001%

    No Known Activations