INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     essentially
    -0.07
     DECL
    -0.07
     DET
    -0.07
    -0.06
    INC
    -0.06
    'l
    -0.06
     vinc
    -0.06
     Finite
    -0.06
    eldo
    -0.06
     Repair
    -0.06
    POSITIVE LOGITS
     disastr
    0.07
     autob
    0.07
     Cave
    0.07
     dragon
    0.06
     escaped
    0.06
     Marcel
    0.06
    oustic
    0.06
    -cigarettes
    0.06
    зации
    0.06
    ğına
    0.06
    Act Density 0.002%

    No Known Activations