INDEX
    Explanations

    mentions of specific names and titles, particularly in the context of academic or scientific references

    New Auto-Interp
    Negative Logits
    ùa
    -0.21
    asons
    -0.19
    aven
    -0.19
    aker
    -0.17
    outh
    -0.17
    IME
    -0.17
    indre
    -0.17
    ensa
    -0.16
    edi
    -0.16
    oria
    -0.16
    POSITIVE LOGITS
    ched
    0.26
    eller
    0.19
    ee
    0.18
    cc
    0.17
    roz
    0.17
    ches
    0.17
    ow
    0.17
    lod
    0.16
     incon
    0.16
    cd
    0.16
    Act Density 0.040%

    No Known Activations