INDEX
    Explanations

    phrases indicating possession or existence

    New Auto-Interp
    Negative Logits
    763
    -0.15
    kom
    -0.14
    Ùĩ
    -0.14
    ppo
    -0.14
    ipeg
    -0.14
    f
    -0.14
    ìĦł
    -0.14
     Duy
    -0.14
     váºŃy
    -0.14
    ayım
    -0.13
    POSITIVE LOGITS
     why
    0.31
     how
    0.27
     where
    0.23
    why
    0.22
     what
    0.22
     precisely
    0.22
    为ä»Ģä¹Ī
    0.19
    how
    0.18
     exactly
    0.17
     true
    0.17
    Act Density 0.090%

    No Known Activations