INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     captains
    -0.07
     killing
    -0.07
     interesting
    -0.07
    vehicle
    -0.07
     thirst
    -0.07
     CAM
    -0.07
     manager
    -0.06
    ]))
    ↵
    -0.06
    IDD
    -0.06
     initiated
    -0.06
    POSITIVE LOGITS
    _From
    0.07
    JV
    0.07
    анов
    0.06
    (inv
    0.06
    ่าว
    0.06
    ivic
    0.06
    essenger
    0.06
    0.06
     منه
    0.06
    iev
    0.06
    Act Density 0.002%

    No Known Activations