INDEX
    Explanations

    phrases indicating potential actions or capabilities

    New Auto-Interp
    Negative Logits
     chi̍t
    -0.54
     unknownFields
    -0.53
    buttonShape
    -0.52
    aimerais
    -0.52
    Rüyada
    -0.49
    hoenix
    -0.48
    posedge
    -0.48
    好きです
    -0.47
    大好きです
    -0.47
     bahkan
    -0.47
    POSITIVE LOGITS
     möglichst
    0.82
     puissiez
    0.79
     becomes
    0.73
     can
    0.72
     بتوان
    0.70
     easier
    0.68
     possa
    0.68
    就不会
    0.68
    才會
    0.65
     become
    0.64
    Act Density 0.171%

    No Known Activations