INDEX
Explanations
words related to cleanliness, specifically related to washing
references to the concept of washing or washrooms
New Auto-Interp
Negative Logits
reme
-0.80
ourke
-0.78
*/(
-0.77
iasm
-0.75
Defenders
-0.69
vernment
-0.68
auri
-0.63
izons
-0.63
appre
-0.62
rador
-0.61
POSITIVE LOGITS
ashore
1.09
cloth
0.98
stakes
0.95
wash
0.91
aways
0.91
houses
0.89
washed
0.84
gate
0.83
robe
0.83
rooms
0.81
Activations Density 0.005%