INDEX
    Explanations

    phrases related to evidence and support for claims

    New Auto-Interp
    Negative Logits
    098
    -0.18
    099
    -0.16
     lif
    -0.15
    etat
    -0.14
    lse
    -0.14
    ivor
    -0.14
    æk
    -0.14
    ÅĻe
    -0.14
     Cad
    -0.14
    edd
    -0.14
    POSITIVE LOGITS
     Ulus
    0.16
    abd
    0.15
    hana
    0.15
     Giang
    0.14
    arez
    0.14
    uto
    0.14
     grounds
    0.14
    otomy
    0.14
    ccoli
    0.14
    beg
    0.14
    Act Density 0.320%

    No Known Activations