INDEX
    Explanations

    adaptation from book

    New Auto-Interp
    Negative Logits
    щают
    -0.08
     bombers
    -0.07
     уход
    -0.06
    _IB
    -0.06
     rẻ
    -0.06
    眼睛
    -0.06
     cầm
    -0.06
    assic
    -0.06
    _STA
    -0.06
    RID
    -0.06
    POSITIVE LOGITS
                		
    0.07
     hates
    0.07
     medal
    0.07
    lessons
    0.06
     ineffective
    0.06
     eligible
    0.06
     Gems
    0.06
    -material
    0.06
     frames
    0.06
    apple
    0.06
    Act Density 0.013%

    No Known Activations