INDEX
    Explanations

    expressions related to objectives or intended outcomes

    New Auto-Interp
    Negative Logits
    NameInMap
    -0.56
    -
    -0.53
     disambiguazione
    -0.51
    k
    -0.50
    さまで
    -0.50
     Stellung
    -0.49
    type
    -0.48
    しまう
    -0.48
    دارة
    -0.46
    してしまう
    -0.45
    POSITIVE LOGITS
     aimed
    0.98
     nhằm
    0.95
    เพื่อ
    0.82
     aim
    0.82
     aiming
    0.81
    目的是
    0.81
     bertujuan
    0.81
     bedoeld
    0.80
     aims
    0.80
    旨在
    0.80
    Act Density 0.247%

    No Known Activations