INDEX
    Explanations

    numbered or bulleted lists

    New Auto-Interp
    Negative Logits
    nonumber
    0.38
     unworthy
    0.36
     όχι
    0.36
     ikke
    0.35
     hilfreich
    0.35
     falso
    0.34
     není
    0.34
     nejsou
    0.33
     இல்லாத
    0.33
     falsehood
    0.33
    POSITIVE LOGITS
     затем
    0.53
     then
    0.43
     prepare
    0.43
     Затем
    0.41
    调整
    0.40
     następnie
    0.40
    然后
    0.39
     przygot
    0.39
     vervolgens
    0.39
     tweaked
    0.39
    Act Density 0.336%

    No Known Activations