INDEX
    Explanations

    mentions of disapproval or criticism

    a specific character or symbol in the text

    New Auto-Interp
    Negative Logits
     deed
    -0.69
     tabl
    -0.69
     Norn
    -0.68
     apes
    -0.68
     Slug
    -0.67
     prefrontal
    -0.65
     telesc
    -0.64
     pleasures
    -0.64
     condem
    -0.63
     Directorate
    -0.62
    POSITIVE LOGITS
    ï¸ı
    1.09
    âĶĢâĶĢ
    0.95
    ternity
    0.92
    lean
    0.92
    ever
    0.91
    âĸł
    0.87
    \-
    0.85
    conom
    0.84
    ··
    0.83
    very
    0.83
    Act Density 0.200%

    No Known Activations