INDEX
    Explanations

    affirmations or agreements in conversations

    New Auto-Interp
    Negative Logits
    .fromFunction
    -0.15
    ickerView
    -0.14
     dipl
    -0.14
    dum
    -0.14
    posable
    -0.14
    _ASSUME
    -0.14
    prar
    -0.14
     Dummy
    -0.13
     Milton
    -0.13
    KER
    -0.13
    POSITIVE LOGITS
     vice
    0.16
    ouse
    0.15
    eo
    0.14
    ifi
    0.14
     Vice
    0.14
    vla
    0.14
    lobs
    0.13
     dün
    0.13
    atoi
    0.13
    .Helper
    0.13
    Act Density 0.035%

    No Known Activations