INDEX
    Explanations

    frequency and preference

    New Auto-Interp
    Negative Logits
     erneut
    0.54
    我们要
    0.43
    desired
    0.43
     correttamente
    0.41
     непосредственно
    0.40
     sucesivamente
    0.40
     অনিশ্চ
    0.40
     opnieuw
    0.40
     Contains
    0.39
    hopefully
    0.39
    POSITIVE LOGITS
     prefer
    0.93
     rarely
    0.91
     предпочита
    0.88
     prefers
    0.83
     seldom
    0.80
     routinely
    0.79
     occasionally
    0.76
     Prefer
    0.75
    Occasionally
    0.74
    prefer
    0.70
    Act Density 0.020%

    No Known Activations