INDEX
Explanations
words related to clothing items, especially shirts
words associated with police, limousines, and medical references
New Auto-Interp
Negative Logits
PORT
-0.79
hyde
-0.68
Dear
-0.65
Patent
-0.64
Spear
-0.63
Walker
-0.61
Falls
-0.61
FIELD
-0.61
Spur
-0.61
Hendricks
-0.61
POSITIVE LOGITS
inations
1.10
itational
1.02
ices
0.99
itating
0.96
utes
0.96
atorial
0.95
iency
0.94
ailable
0.94
icy
0.93
irtual
0.92
Activations Density 0.128%