INDEX
    Explanations

    instances of significant nouns and their attributes

    New Auto-Interp
    Negative Logits
    terior
    -0.15
    ilk
    -0.15
    ãĤ¤ãĤ¹
    -0.14
    irt
    -0.14
     Resident
    -0.13
     standards
    -0.13
    etta
    -0.13
    iences
    -0.13
    ien
    -0.13
    ugar
    -0.13
    POSITIVE LOGITS
    ayo
    0.15
    /moment
    0.15
    ismet
    0.15
    hora
    0.15
    agli
    0.15
    ibold
    0.14
    InstanceState
    0.14
    αν
    0.14
    aln
    0.14
    yms
    0.14
    Act Density 0.041%

    No Known Activations