INDEX
    Explanations

    words or phrases that indicate a ranking or rating

    New Auto-Interp
    Negative Logits
    ãģķãģĦ
    -0.16
    جا
    -0.15
    BAB
    -0.14
    äch
    -0.14
    hem
    -0.14
    scal
    -0.14
    Oak
    -0.14
    bab
    -0.13
    upy
    -0.13
     Ney
    -0.13
    POSITIVE LOGITS
     Persona
    0.33
    Persona
    0.26
     persona
    0.26
     golden
    0.25
     Golden
    0.25
     Person
    0.24
    persona
    0.24
    Golden
    0.22
    golden
    0.22
     Velvet
    0.21
    Act Density 0.000%

    No Known Activations