INDEX
    Explanations

    phrases related to concerns about problems and allegations in various contexts

    New Auto-Interp
    Negative Logits
    rungsseite
    -0.83
    transQ
    -0.82
    <unused42>
    -0.80
    <unused23>
    -0.80
    <unused76>
    -0.80
    <unused41>
    -0.79
    <unused43>
    -0.79
    <unused28>
    -0.79
    [@BOS@]
    -0.79
    <unused8>
    -0.79
    POSITIVE LOGITS
     подоб
    0.56
     such
    0.52
     solchen
    0.52
     solche
    0.49
     similar
    0.45
     solcher
    0.43
    这类
    0.43
     like
    0.43
    こういう
    0.42
    这样的
    0.42
    Act Density 0.524%

    No Known Activations