INDEX
    Explanations

    political territory size

    New Auto-Interp
    Negative Logits
     ape
    -0.07
    Scr
    -0.07
    deki
    -0.07
     canoe
    -0.07
     intrigued
    -0.06
     unos
    -0.06
    Born
    -0.06
    !--
    -0.06
     nir
    -0.06
    nim
    -0.06
    POSITIVE LOGITS
    bau
    0.07
     reshape
    0.07
     rủi
    0.06
     portrayed
    0.06
    0.06
    ограм
    0.06
    통신
    0.06
     Roosevelt
    0.06
     automatic
    0.06
    0.06
    Act Density 0.027%

    No Known Activations