INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     hopes
    0.85
     donna
    0.84
     glad
    0.81
     hope
    0.78
     unable
    0.77
     recommending
    0.76
     purposely
    0.76
     joke
    0.76
     obligé
    0.75
    0.75
    POSITIVE LOGITS
    Effective
    0.88
    Flexible
    0.85
     flexible
    0.83
    flexible
    0.80
    "?:
    0.75
     Czy
    0.72
    どのような
    0.71
     effective
    0.71
    テンツ
    0.70
     ගැනීම
    0.70
    Act Density 0.084%

    No Known Activations