INDEX
    Explanations

    examples and explanations in a context

    phrases that introduce examples or explanations

    New Auto-Interp
    Negative Logits
    roy
    -0.78
    ess
    -0.70
    gaard
    -0.69
    owed
    -0.69
    esses
    -0.68
    inate
    -0.65
    ige
    -0.64
    GM
    -0.63
     inev
    -0.63
    ND
    -0.61
    POSITIVE LOGITS
    ierre
    0.77
    zech
    0.69
    hov
    0.68
    tti
    0.68
    trak
    0.67
    gans
    0.66
     Photographer
    0.64
    =#
    0.64
    ooters
    0.63
     Reflex
    0.63
    Act Density 0.104%

    No Known Activations