INDEX
    Explanations

    requests for help or assistance

    New Auto-Interp
    Negative Logits
    さは
    -0.85
    ובי
    -0.81
    讓人
    -0.80
    promote
    -0.80
     jullie
    -0.79
    アンサー
    -0.79
    -0.79
    tiens
    -0.79
     paramètres
    -0.78
    Ded
    -0.77
    POSITIVE LOGITS
     help
    1.81
     assistance
    1.47
    help
    1.27
     guidance
    1.26
     advice
    1.16
     request
    0.97
     Assistance
    0.96
     Hilfe
    0.95
    Help
    0.94
    delige
    0.94
    Act Density 0.081%

    No Known Activations