INDEX
    Explanations

    phrases indicating strong personal opinions or convictions

    New Auto-Interp
    Negative Logits
    цездатний
    -0.78
    Autoritní
    -0.74
    出版年
    -0.70
    ſammen
    -0.63
    IntoConstraints
    -0.62
     تانيه
    -0.61
    السكان
    -0.60
     Италијани
    -0.59
    Personensuche
    -0.59
    ViewFeatures
    -0.58
    POSITIVE LOGITS
    0.52
    The
    0.42
    Is
    0.37
     “
    0.37
    “…
    0.37
    '
    0.36
    0.36
    <bos>
    0.36
    For
    0.36
    "
    0.35
    Act Density 0.190%

    No Known Activations