INDEX
    Explanations

    names with "bert"

    New Auto-Interp
    Negative Logits
    bert
    -1.23
    BERT
    -0.82
    berto
    -0.82
    banking
    -0.73
    ThroughAttribute
    -0.70
     Banking
    -0.69
     banking
    -0.68
    berts
    -0.68
     Bert
    -0.67
    berta
    -0.66
    POSITIVE LOGITS
    LookAnd
    0.54
    Портал
    0.53
    DropTable
    0.51
     تانيه
    0.48
    annique
    0.48
    gbaar
    0.46
     ResponseEntity
    0.46
     bought
    0.45
     Jurí
    0.45
     lief
    0.44
    Act Density 0.032%

    No Known Activations