INDEX
    Explanations

    summarizing differences in tables

    New Auto-Interp
    Negative Logits
     कर्मचारी
    0.73
     આરોપી
    0.68
    Help
    0.67
    drop
    0.66
    וא
    0.65
     कर्मचारियों
    0.65
     سي
    0.64
     Bell
    0.63
    Besch
    0.63
    tay
    0.62
    POSITIVE LOGITS
    fz
    0.83
    pz
    0.79
     mechanistic
    0.78
    RANGER
    0.76
     dez
    0.76
     Dex
    0.76
    zun
    0.75
     jul
    0.75
    сера
    0.75
     irony
    0.75
    Act Density 0.005%

    No Known Activations