INDEX
Explanations
words related to cleansing or removing something
variations of the word "purge."
New Auto-Interp
Negative Logits
ONES
-0.72
Standing
-0.71
Unch
-0.69
areth
-0.67
lihood
-0.65
enegger
-0.65
olson
-0.65
Engel
-0.64
Werner
-0.64
LER
-0.64
POSITIVE LOGITS
ple
1.25
vey
1.21
ported
1.13
ples
1.08
pose
1.07
porting
1.05
posed
1.05
poses
1.03
cell
0.99
pure
0.99
Activations Density 0.020%