INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     gates
    -0.07
     weather
    -0.06
    ango
    -0.06
     Kerala
    -0.06
     strange
    -0.06
     shields
    -0.06
    adj
    -0.06
    Dark
    -0.05
     directors
    -0.05
    >"
    ↵
    -0.05
    POSITIVE LOGITS
    bob
    0.07
     gost
    0.07
     tert
    0.07
    .WEST
    0.07
     HE
    0.07
    _DER
    0.07
     F
    0.06
    lenmiş
    0.06
     Luz
    0.06
    velle
    0.06
    Act Density 0.012%

    No Known Activations