INDEX
    Explanations

    transitioning to action

    New Auto-Interp
    Negative Logits
     Angaben
    1.11
    Wheat
    0.90
    Tall
    0.89
    0.88
     확실
    0.87
    0.86
    ঁপ
    0.84
    ինչ
    0.83
     иначе
    0.83
    Human
    0.81
    POSITIVE LOGITS
    1.17
     desse
    1.16
     dessa
    1.11
    ت
    1.10
    تهم
    1.08
    gressive
    1.07
    imposed
    1.05
    iscono
    1.05
     essas
    1.05
    averaged
    1.04
    Act Density 0.000%

    No Known Activations