INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     information
    0.52
     anything
    0.49
     clarification
    0.44
     Clar
    0.40
     Help
    0.40
    Help
    0.40
     informasjon
    0.40
     Information
    0.39
    信息
    0.39
     help
    0.38
    POSITIVE LOGITS
     FOLLOW
    0.49
    follow
    0.48
     follow
    0.47
     تابع
    0.43
    Follow
    0.43
    FOLLOW
    0.41
     followup
    0.40
    follows
    0.39
    フォロー
    0.38
     сле
    0.37
    Act Density 0.014%

    No Known Activations