INDEX
    Explanations

    names and mentions of individuals in various contexts

    New Auto-Interp
    Negative Logits
     itſelf
    -0.88
     pleaſure
    -0.82
     ſever
    -0.82
     Majefty
    -0.81
     purpoſe
    -0.81
    tagHelperRunner
    -0.79
    neſs
    -0.79
     متعلقه
    -0.78
     themſelves
    -0.77
     myſelf
    -0.74
    POSITIVE LOGITS
    N
    0.48
    0.42
    ..
    0.41
    Phương
    0.41
     N
    0.40
    brand
    0.39
    ...
    0.39
    有名
    0.39
    T
    0.38
    0.38
    Act Density 0.379%

    No Known Activations