INDEX
    Explanations

    mentions of relationships and social connections, particularly focusing on friends and family

    New Auto-Interp
    Negative Logits
     Biele
    -0.86
     LoggerFactory
    -0.83
     Hv
    -0.81
     Theaters
    -0.81
    -0.80
     Kae
    -0.79
    Còn
    -0.78
    '));
    
    -0.75
    scott
    -0.74
    `,
    
    -0.74
    POSITIVE LOGITS
     friends
    2.05
     Friends
    1.95
    friends
    1.93
    Friends
    1.84
     FRIENDS
    1.81
     Friend
    1.62
     friend
    1.61
    Friend
    1.56
     FRIEND
    1.45
    friend
    1.36
    Act Density 0.042%

    No Known Activations