INDEX
Explanations
terms related to cleaning and hygiene activities
New Auto-Interp
Negative Logits
-0.78
[
-0.70
“
-0.69
<eos>
-0.65
-0.65
(
-0.64
I
-0.61
↵↵
-0.60
\
-0.60
,
-0.59
POSITIVE LOGITS
cleaning
1.42
cleans
1.39
Cleaning
1.36
cleanliness
1.35
Cleaning
1.33
CLEANING
1.32
cleaners
1.30
Cleaners
1.27
Efq
1.21
cleaning
1.20
Activations Density 0.245%