INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     add
    -1.61
     Add
    -1.52
    add
    -1.49
     average
    -1.39
    Add
    -1.37
     ADD
    -1.26
    average
    -1.21
     adding
    -1.16
     error
    -1.16
    ADD
    -1.11
    POSITIVE LOGITS
     chofe
    0.65
     detachment
    0.56
    0.56
     to
    0.54
     fevere
    0.54
     Hift
    0.54
     Efq
    0.54
     Houſe
    0.53
     faſt
    0.51
     raiſ
    0.50
    Act Density 0.139%

    No Known Activations