INDEX
    Explanations

    trusted friends or family for help

    New Auto-Interp
    Negative Logits
     강조
    0.44
     важно
    0.42
     uniqueness
    0.42
     tantrums
    0.41
     नंबर्स
    0.41
    连续
    0.41
     satire
    0.41
    強調
    0.41
     emphasise
    0.41
     vurg
    0.41
    POSITIVE LOGITS
     nearby
    0.61
    帮忙
    0.61
     trusted
    0.61
     trustworthy
    0.57
     помощь
    0.56
    友人
    0.56
     సహాయ
    0.56
     મદદ
    0.56
     hjel
    0.56
     도움
    0.55
    Act Density 0.086%

    No Known Activations