INDEX
    Explanations

    instances of names, specifically focusing on proper nouns and entities

    New Auto-Interp
    Negative Logits
     behav
    -0.76
     disadvant
    -0.71
     distingu
    -0.70
     misunder
    -0.70
     Ender
    -0.69
     escape
    -0.65
     Reply
    -0.65
     independ
    -0.64
     AB
    -0.64
     Ichigo
    -0.64
    POSITIVE LOGITS
    EStreamFrame
    0.98
    milo
    0.92
    ForgeModLoader
    0.90
    Ń·
    0.86
    ola
    0.85
     Plaza
    0.81
    810
    0.81
    ilogy
    0.80
     TAMADRA
    0.80
    cci
    0.79
    Act Density 0.143%

    No Known Activations