INDEX
    Explanations

    references to moderation or moderate concepts in various contexts

    New Auto-Interp
    Negative Logits
    eger
    -0.16
    nat
    -0.16
    ings
    -0.16
    forge
    -0.16
    fully
    -0.15
    roperty
    -0.15
    ÑĤÑĶ
    -0.14
    ajan
    -0.14
    istics
    -0.14
    udi
    -0.14
    POSITIVE LOGITS
    (Mod
    0.20
    /mod
    0.19
    (mod
    0.18
    .MOD
    0.18
    eterminate
    0.17
    amente
    0.17
    ded
    0.17
    .mods
    0.17
    éc
    0.16
    igli
    0.16
    Act Density 0.030%

    No Known Activations