INDEX
    Explanations

    emotional expressions and sentiments related to personal beliefs and moral judgments

    New Auto-Interp
    Negative Logits
    roup
    -0.17
    olik
    -0.15
    immers
    -0.15
    estroy
    -0.15
     Všech
    -0.14
    ekil
    -0.14
    ammers
    -0.14
    ovan
    -0.14
     Flake
    -0.14
    ewise
    -0.14
    POSITIVE LOGITS
     Antworten
    0.14
    jadi
    0.14
     stir
    0.14
    GY
    0.14
    sand
    0.14
     Clamp
    0.14
    pus
    0.13
     bet
    0.13
    ataset
    0.13
    sr
    0.13
    Act Density 0.225%

    No Known Activations