INDEX
    Explanations

    phrases indicating relationships or connections

    New Auto-Interp
    Negative Logits
    aho
    -0.17
    tru
    -0.14
    Ps
    -0.14
    /gin
    -0.14
    ells
    -0.13
     bow
    -0.13
    atra
    -0.13
    5
    -0.13
    ARC
    -0.13
    berger
    -0.13
    POSITIVE LOGITS
    ίδ
    0.15
    ITIES
    0.15
    okens
    0.14
    iktig
    0.13
    erli
    0.13
    gba
    0.13
    @stop
    0.13
    aklı
    0.13
    uibModal
    0.13
    enci
    0.13
    Act Density 0.209%

    No Known Activations