INDEX
Explanations
phrases related to military service, particularly focusing on transgender personnel
references to specific sports teams
New Auto-Interp
Negative Logits
hens
-0.84
xual
-0.77
thren
-0.76
hend
-0.74
osite
-0.65
idem
-0.65
urion
-0.65
philosoph
-0.64
heon
-0.64
hent
-0.64
POSITIVE LOGITS
Reward
0.81
inctions
0.73
Instruments
0.72
suscept
0.71
WARE
0.66
Surviv
0.65
Carnage
0.65
Tracker
0.65
ROR
0.63
Ign
0.61
Activations Density 0.000%