INDEX
    Explanations

    Code/configuration

    New Auto-Interp
    Negative Logits
    ILD
    -0.07
    -0.07
    .:.:.
    -0.07
     отлич
    -0.06
     Angry
    -0.06
     khỏe
    -0.06
     fg
    -0.06
     Kuala
    -0.06
    	dto
    -0.06
    	mc
    -0.06
    POSITIVE LOGITS
    Instantiate
    0.07
    _human
    0.07
     gather
    0.07
    .browser
    0.06
    Create
    0.06
     bodily
    0.06
    Down
    0.06
     enjoyable
    0.06
    (token
    0.06
    .scalar
    0.06
    Act Density 0.049%

    No Known Activations