INDEX
    Explanations

    negative phrases related to approvals or conditions

    New Auto-Interp
    Negative Logits
    Ŀ
    -0.18
    peri
    -0.15
    ÙĬز
    -0.14
    å£
    -0.14
    croft
    -0.14
    æľ¯
    -0.14
    ÛĮز
    -0.14
     Äijá»Ŀi
    -0.14
    osaurs
    -0.14
    ạc
    -0.14
    POSITIVE LOGITS
     latter
    0.18
    ouro
    0.17
     itself
    0.15
     MAD
    0.15
     Ging
    0.15
    ENCE
    0.15
     indeed
    0.15
     meaning
    0.15
     himself
    0.14
     wash
    0.14
    Act Density 0.149%

    No Known Activations