INDEX
    Explanations

    complex discussions about race and cultural identity

    statements about gameplay experiences

    questioning and speculative phrases

    New Auto-Interp
    Negative Logits
     fabulous
    -0.61
    Whew
    -0.57
    !
    -0.55
     terrific
    -0.52
     lovely
    -0.51
     wonderful
    -0.51
    とっても
    -0.50
    fabulous
    -0.49
    なかなか
    -0.48
    !).
    -0.47
    POSITIVE LOGITS
    /=
    0.92
     objectively
    0.91
     Lmao
    0.81
     lmao
    0.78
     subjective
    0.76
     argumento
    0.75
     idk
    0.74
     Idk
    0.74
    Referències
    0.73
    Idk
    0.73
    Act Density 0.134%

    No Known Activations