INDEX
    Explanations

    Punctuation

    New Auto-Interp
    Negative Logits
    .session
    -0.08
    Protein
    -0.08
    protein
    -0.08
    fels
    -0.08
    _ext
    -0.08
     protein
    -0.07
     посвящ
    -0.07
    оду
    -0.07
    odian
    -0.07
     Barrett
    -0.07
    POSITIVE LOGITS
     friendliness
    0.08
    落实
    0.08
     foster
    0.08
     openness
    0.08
     budaya
    0.08
     tamen
    0.08
     ressal
    0.08
     dhex
    0.08
     sharpen
    0.08
     rzecz
    0.08
    Act Density 0.007%

    No Known Activations