INDEX
    Explanations

    expressions of personal opinions or beliefs

    New Auto-Interp
    Negative Logits
     seemingly
    -0.27
     seem
    -0.25
     seems
    -0.24
     nicht
    -0.23
     seemed
    -0.22
     không
    -0.21
     Seems
    -0.21
     ikke
    -0.20
     apparently
    -0.20
     not
    -0.20
    POSITIVE LOGITS
     fair
    0.20
    fair
    0.19
     fairly
    0.17
    overall
    0.17
    '].$
    0.16
     overall
    0.16
     Fair
    0.16
    оÑĢалÑĮ
    0.16
     mostly
    0.15
    .safe
    0.15
    Act Density 0.205%

    No Known Activations