INDEX
Explanations
mentions of hats
mentions of hats
New Auto-Interp
Negative Logits
olon
-0.70
Recover
-0.67
venant
-0.65
rans
-0.61
Document
-0.60
eway
-0.59
Citation
-0.59
Residential
-0.58
ara
-0.58
LR
-0.58
POSITIVE LOGITS
hats
4.11
hat
2.16
Hats
2.11
helmets
1.72
shirts
1.61
jackets
1.59
coats
1.59
masks
1.57
costumes
1.55
Hat
1.54
Activations Density 0.020%