INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Islington
    0.40
     destin
    0.35
    separated
    0.35
    attha
    0.34
    0.33
     arte
    0.33
     scaled
    0.33
    olok
    0.33
    0.32
    0.32
    POSITIVE LOGITS
     drivers
    0.92
     driver
    0.87
    Driver
    0.85
    Drivers
    0.84
     cone
    0.82
     Driver
    0.80
     Drivers
    0.79
     ड्राइवर
    0.77
    driver
    0.76
     драй
    0.75
    Act Density 0.015%

    No Known Activations