INDEX
Explanations
items of clothing, specifically socks and underwear
references to socks and underwear
New Auto-Interp
Negative Logits
miscar
-0.71
Alzheimer
-0.66
Huntington
-0.63
Virgin
-0.63
AUT
-0.62
ENE
-0.62
represented
-0.62
CNS
-0.61
views
-0.61
sob
-0.60
POSITIVE LOGITS
socks
1.51
sock
1.16
yarn
0.97
underwear
0.96
Shoes
0.90
leeve
0.88
Shirt
0.86
nodd
0.86
ipation
0.82
worms
0.81
Activations Density 0.005%