INDEX
    Explanations

    phrases or words related to subjective judgments or opinions

    complex phrases related to social issues and human experiences

    New Auto-Interp
    Negative Logits
    arnaev
    -0.61
     confir
    -0.56
    ĪĴ
    -0.56
     Pok
    -0.54
    ãĥ¯ãĥ³
    -0.53
     Sorce
    -0.53
    arthed
    -0.51
     Jagu
    -0.51
    Orig
    -0.50
     streng
    -0.50
    POSITIVE LOGITS
    ".
    2.29
    ",
    2.22
    "?
    2.16
    ";
    2.14
    "!
    2.09
    "
    2.02
    ":
    2.01
    "...
    1.99
    "â̦
    1.99
    ".[
    1.98
    Act Density 0.481%

    No Known Activations