INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     friv
    -0.08
    -0.08
     extinction
    -0.08
    注销
    -0.08
    critic
    -0.07
    yr
    -0.07
     raffle
    -0.07
     강화
    -0.07
     Merit
    -0.07
    daily
    -0.07
    POSITIVE LOGITS
     кам
    0.09
     walls
    0.08
     jag
    0.08
     bigger
    0.08
     determined
    0.08
     חבר
    0.07
     seams
    0.07
     Antonio
    0.07
     verhind
    0.07
     Стар
    0.07
    Act Density 0.001%

    No Known Activations