INDEX
    Explanations

    words related to perception and understanding in relation to concepts, likely linked to the arts or literature

    New Auto-Interp
    Negative Logits
     Bakan
    -0.42
     Cientí
    -0.42
     busto
    -0.41
     swamps
    -0.40
     Danish
    -0.40
     Finnish
    -0.40
     parma
    -0.40
     husband
    -0.39
     Czech
    -0.39
     Cze
    -0.39
    POSITIVE LOGITS
     im
    0.67
     mit
    0.65
     und
    0.64
     sowie
    0.56
     am
    0.53
     einschließlich
    0.51
     für
    0.51
     zuständig
    0.50
     zum
    0.49
     aus
    0.49
    Act Density 0.246%

    No Known Activations