INDEX
    Explanations

    references to academic disciplines or fields of study

    New Auto-Interp
    Negative Logits
    lla
    -0.15
    illage
    -0.14
    .Pattern
    -0.14
    005
    -0.14
    ubble
    -0.14
    è¥
    -0.13
    bee
    -0.13
    agan
    -0.13
    hab
    -0.13
    anco
    -0.13
    POSITIVE LOGITS
    KIT
    0.16
    viar
    0.15
    adol
    0.15
    unn
    0.15
    ĩ
    0.15
    inas
    0.14
    9
    0.13
    8
    0.13
    ROTO
    0.13
    anja
    0.13
    Act Density 0.003%

    No Known Activations