INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     pawn
    -0.07
     wholesome
    -0.07
     Entertainment
    -0.06
     albums
    -0.06
    /J
    -0.06
    lesson
    -0.06
     Masks
    -0.06
    members
    -0.06
     portfolio
    -0.06
     Richt
    -0.06
    POSITIVE LOGITS
     sơn
    0.07
     rencontre
    0.06
    0.06
    .AddTransient
    0.06
     plaint
    0.06
    $model
    0.06
    ât
    0.06
    `}↵
    0.06
    dux
    0.06
    ############################
    0.06
    Act Density 0.032%

    No Known Activations