INDEX
    Explanations

    expressions of preference or personal liking

    New Auto-Interp
    Negative Logits
    нÑĮ
    -0.16
    .Strict
    -0.15
    ona
    -0.15
    velle
    -0.15
    iro
    -0.15
    ove
    -0.14
    appen
    -0.14
    \Bundle
    -0.14
    ills
    -0.14
    iller
    -0.14
    POSITIVE LOGITS
     nothing
    0.24
     challenge
    0.22
     NOTHING
    0.21
    nothing
    0.19
    challenge
    0.18
    488
    0.17
     Challenge
    0.17
     challenges
    0.17
    Challenge
    0.16
    Nothing
    0.16
    Act Density 0.085%

    No Known Activations