INDEX
Explanations
phrases related to cleanliness and washing
references to washing and washrooms
New Auto-Interp
Negative Logits
vernment
-0.84
*/(
-0.79
iasm
-0.77
reme
-0.77
ourke
-0.71
rul
-0.67
elig
-0.67
appre
-0.67
ietal
-0.66
allery
-0.65
POSITIVE LOGITS
ashore
1.14
cloth
0.97
rooms
0.95
stakes
0.91
aways
0.86
robe
0.85
houses
0.85
washing
0.80
apon
0.80
bas
0.79
Activations Density 0.012%