INDEX
Explanations
mentions of specific names and roles within a team or organization
references to a specific player and their contributions or role within a team
New Auto-Interp
Negative Logits
censored
-0.87
dunno
-0.86
complains
-0.84
punishments
-0.84
pretended
-0.82
complaining
-0.81
complain
-0.81
toggle
-0.80
trope
-0.80
shit
-0.79
POSITIVE LOGITS
thrilled
0.93
partnering
0.92
exciting
0.91
partnership
0.90
welcoming
0.87
"}],"
0.87
unparalleled
0.84
proud
0.84
tremendous
0.83
avering
0.82
Activations Density 0.911%