INDEX
    Explanations

    phrases related to assertions and claims, particularly regarding beliefs and statements

    New Auto-Interp
    Negative Logits
     they
    -0.21
     we
    -0.20
     you
    -0.17
    they
    -0.16
     it
    -0.15
    otron
    -0.15
     они
    -0.15
    you
    -0.15
     Logic
    -0.15
     someone
    -0.14
    POSITIVE LOGITS
     that
    0.28
     rằng
    0.27
     bahwa
    0.25
     ÏĮÏĦι
    0.24
     daÃŁ
    0.24
     dass
    0.24
    	that
    0.23
    that
    0.22
     että
    0.22
     že
    0.21
    Act Density 0.266%

    No Known Activations