INDEX
    Explanations

    phrases indicating affection and appreciation towards others

    New Auto-Interp
    Negative Logits
    раздо
    -0.58
     utafitiHapana
    -0.54
     لئے
    -0.54
    retum
    -0.53
    viembre
    -0.53
    ArrowToggle
    -0.53
     sive
    -0.52
    ここでは
    -0.52
    CppCodeGen
    -0.52
     bParam
    -0.51
    POSITIVE LOGITS
     actually
    0.76
     ACTUALLY
    0.73
     LOTS
    0.72
     really
    0.70
     REALLY
    0.70
     definitely
    0.68
     AWESOME
    0.68
    actually
    0.67
    yntaxException
    0.66
     pretty
    0.65
    Act Density 0.187%

    No Known Activations