INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    FRAG
    -0.96
     BROS
    -0.83
     responded
    -0.76
     rewarding
    -0.76
     unrealistic
    -0.74
     transferred
    -0.73
     insight
    -0.72
     profan
    -0.72
    ungkinan
    -0.72
     改
    -0.71
    POSITIVE LOGITS
     nucle
    0.86
     arché
    0.84
    Mosa
    0.84
    DOCK
    0.82
     homomorphism
    0.80
     összes
    0.80
    專輯
    0.79
    0.77
    ųjų
    0.77
    ställ
    0.77
    Act Density 0.011%

    No Known Activations