INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     fatos
    -0.08
     práticas
    -0.08
     faktor
    -0.08
    事实
    -0.08
     melhores
    -0.08
     Kath
    -0.08
    	table
    -0.08
     grounding
    -0.07
     Facts
    -0.07
     IDs
    -0.07
    POSITIVE LOGITS
    verb
    0.08
     цвета
    0.08
     caught
    0.08
     exhibited
    0.07
     Ms
    0.07
     gull
    0.07
    ად
    0.07
     Razor
    0.07
    ғым
    0.07
    0.07
    Act Density 0.001%

    No Known Activations