INDEX
    Explanations

    teaching activities

    New Auto-Interp
    Negative Logits
    ptune
    -0.09
     unanimously
    -0.07
    纽带
    -0.07
    ertainty
    -0.07
    uter
    -0.07
    klär
    -0.06
    داف
    -0.06
    座谈会
    -0.06
    -0.06
    Okay
    -0.06
    POSITIVE LOGITS
     blonde
    0.08
     diss
    0.08
     linux
    0.07
    _pool
    0.07
    Pure
    0.07
     Stem
    0.07
    EMU
    0.07
    	price
    0.07
    -producing
    0.07
     זוכר
    0.07
    Act Density 0.026%

    No Known Activations