INDEX
Explanations
words related to cleansing or purging
New Auto-Interp
Negative Logits
Unch
-0.71
enegger
-0.71
Standing
-0.70
worthiness
-0.66
Werner
-0.63
areth
-0.63
ONES
-0.63
Anxiety
-0.62
helmets
-0.61
Luther
-0.61
POSITIVE LOGITS
ple
1.32
vey
1.26
ported
1.16
POSE
1.15
pose
1.12
pure
1.09
poses
1.09
ples
1.06
ging
1.00
posed
0.99
Activations Density 0.078%