INDEX
    Explanations

    instances of the word "tolerance" or related terms

    references to concepts of tolerance and acceptance

    New Auto-Interp
    Negative Logits
    prints
    -0.79
    tein
    -0.73
    Downloadha
    -0.69
    call
    -0.68
    grave
    -0.67
    wind
    -0.64
    pocket
    -0.64
    guard
    -0.63
    eu
    -0.62
    wer
    -0.62
    POSITIVE LOGITS
     tolerant
    1.05
     tolerance
    1.00
     toler
    0.98
     tolerate
    0.98
    olerance
    0.91
     intolerance
    0.90
     intoler
    0.87
     tolerated
    0.81
    olini
    0.72
    terness
    0.72
    Act Density 0.022%

    No Known Activations