INDEX
Explanations
words related to clothing items, specifically shirts
references to T-shirts or phrases related to T-shirts
New Auto-Interp
Negative Logits
elsen
-0.77
ruciating
-0.66
trib
-0.65
ntil
-0.65
ENC
-0.62
iciary
-0.60
pregn
-0.59
judicial
-0.59
terrestrial
-0.59
witness
-0.59
POSITIVE LOGITS
shirt
1.27
shirts
1.15
hirt
1.00
shirts
0.99
boy
0.98
leeve
0.94
idas
0.91
sleeve
0.87
shirt
0.86
boys
0.86
Activations Density 0.009%