INDEX
    Explanations

    phrases indicating a conversational or relational context

    New Auto-Interp
    Negative Logits
     newPos
    -0.15
     newX
    -0.14
    avec
    -0.14
     Abram
    -0.14
     neutral
    -0.14
    iques
    -0.14
    usalem
    -0.14
     Fog
    -0.14
     LLP
    -0.13
    ê´Ģ
    -0.13
    POSITIVE LOGITS
     note
    0.15
    ductive
    0.15
    ften
    0.15
    asta
    0.14
    OLON
    0.14
    UED
    0.14
    EMALE
    0.14
    ensis
    0.14
    reff
    0.14
    CDF
    0.14
    Act Density 0.018%

    No Known Activations