INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     fome
    -0.08
     søker
    -0.08
    -0.08
     duet
    -0.08
     Pey
    -0.07
    -estar
    -0.07
    TOT
    -0.07
    snap
    -0.07
    BST
    -0.07
     Lola
    -0.07
    POSITIVE LOGITS
     mindful
    0.11
     cautious
    0.10
    准确
    0.10
     осторож
    0.10
    0.10
     complac
    0.10
     careful
    0.09
    0.09
     verbose
    0.09
     subtle
    0.09
    Act Density 0.011%

    No Known Activations