INDEX
    Explanations

    phrases introducing examples or illustrating points

    instances of illustrative examples or case studies

    New Auto-Interp
    Negative Logits
     inev
    -0.85
    ocr
    -0.81
    ess
    -0.73
    orate
    -0.73
    roy
    -0.71
    esses
    -0.69
    livion
    -0.69
    alysed
    -0.68
    ocracy
    -0.68
    ima
    -0.65
    POSITIVE LOGITS
     imagine
    0.72
     suppose
    0.71
    =#
    0.65
     hypot
    0.64
     Sergio
    0.64
    aeper
    0.62
     Buff
    0.61
     Brief
    0.60
    ooters
    0.59
    dinand
    0.59
    Act Density 0.125%

    No Known Activations