INDEX
    Explanations

    references to particular groups within specific historical contexts

    New Auto-Interp
    Negative Logits
    procs
    -0.16
    hood
    -0.15
    oles
    -0.14
     outstanding
    -0.14
    isle
    -0.14
    hora
    -0.14
    filer
    -0.14
    lua
    -0.14
    ãĥªãĤ«
    -0.14
    unning
    -0.13
    POSITIVE LOGITS
    ze
    0.55
    z
    0.54
    zer
    0.51
    zen
    0.50
    zes
    0.47
    zt
    0.46
    zy
    0.44
    zs
    0.40
    zb
    0.40
    zk
    0.39
    Act Density 0.056%

    No Known Activations