INDEX
    Explanations

    participate, submit, university, admire

    New Auto-Interp
    Negative Logits
     signifikan
    1.13
     dapat
    1.09
     +
    1.09
     harus
    1.08
     sering
    1.08
     akan
    1.08
     vocab
    1.06
     niet
    1.05
     nicht
    1.04
     conjunct
    1.02
    POSITIVE LOGITS
    1.66
    1.51
    1.44
    1.36
    1.35
    1.35
    1.33
    1.32
    1.31
    1.30
    Act Density 0.052%

    No Known Activations