INDEX
    Explanations

    terms associated with classifications and categories

    New Auto-Interp
    Negative Logits
     ir
    -0.17
     iii
    -0.17
     ii
    -0.17
     ib
    -0.17
     ip
    -0.17
     ig
    -0.16
    TRGL
    -0.16
     ia
    -0.15
     ic
    -0.15
    (ic
    -0.15
    POSITIVE LOGITS
    I
    0.35
    IS
    0.35
    IC
    0.35
    İ
    0.35
    Ðĺ
    0.35
    IF
    0.34
    IO
    0.33
    I
    0.33
    Ãį
    0.33
    IA
    0.33
    Act Density 0.120%

    No Known Activations