INDEX
    Explanations

    conjunctions then new concepts

    New Auto-Interp
    Negative Logits
     nejen
    0.25
     there
    0.24
     says
    0.24
     What
    0.23
     {
    0.22
     they
    0.22
     ilyen
    0.22
     vorige
    0.22
    وت
    0.21
     hva
    0.21
    POSITIVE LOGITS
     unwillingness
    0.50
     inability
    0.44
     unwavering
    0.43
     willingness
    0.41
     reliance
    0.41
     отсутствие
    0.41
    良好的
    0.38
     unrelenting
    0.37
     reluctance
    0.36
     возможность
    0.36
    Act Density 0.321%

    No Known Activations