INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Miz
    -0.10
    dn
    -0.09
     reconc
    -0.09
     loos
    -0.09
     Rica
    -0.09
     Rico
    -0.08
     def
    -0.08
    rien
    -0.08
    agh
    -0.08
     laid
    -0.08
    POSITIVE LOGITS
     character
    0.12
     role
    0.10
     focus
    0.10
     course
    0.10
    character
    0.09
     Focus
    0.09
    role
    0.09
     Roles
    0.09
     Character
    0.09
     Stay
    0.09
    Act Density 0.010%

    No Known Activations