INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    _widgets
    -0.07
    testdata
    -0.07
     laboratory
    -0.06
    .cycle
    -0.06
     depict
    -0.06
    iding
    -0.06
    episode
    -0.06
    -0.06
    Sessions
    -0.06
     lesser
    -0.06
    POSITIVE LOGITS
     difference
    0.07
    crow
    0.07
     compromised
    0.07
     watchdog
    0.07
     differences
    0.07
     reverted
    0.06
    ště
    0.06
    0.06
    difference
    0.06
    antium
    0.06
    Act Density 0.013%

    No Known Activations