INDEX
    Explanations

    statements that express personal opinions or experiences

    New Auto-Interp
    Negative Logits
    icho
    -0.18
    yo
    -0.18
    ovel
    -0.15
    heads
    -0.15
    ially
    -0.15
    hma
    -0.14
    yum
    -0.14
    ummies
    -0.14
    alse
    -0.14
    \CMS
    -0.14
    POSITIVE LOGITS
    ashi
    0.18
     aim
    0.17
    self
    0.17
    .scalablytyped
    0.17
    ron
    0.17
    abe
    0.16
    riad
    0.16
    Ìĥ
    0.16
    rna
    0.15
    ri
    0.15
    Act Density 0.073%

    No Known Activations