INDEX
    Explanations

    comments and optional explanations

    New Auto-Interp
    Negative Logits
     firstly
    0.38
    //
    0.38
    Firstly
    0.37
     emergent
    0.36
    まずは
    0.35
    Drum
    0.34
    0.33
     تط
    0.32
    一方面
    0.32
    Karl
    0.32
    POSITIVE LOGITS
     Puedes
    0.64
     možete
    0.59
     Uncomment
    0.57
     Optionally
    0.57
     ඔබට
    0.57
    below
    0.57
    你可以
    0.56
     아래
    0.55
     możesz
    0.55
     ניתן
    0.54
    Act Density 0.006%

    No Known Activations