INDEX
Explanations
words related to dirtiness or decay
instances of the word "gr."
New Auto-Interp
Negative Logits
phrine
-0.85
)=(
-0.77
vention
-0.74
hibition
-0.69
ership
-0.69
manship
-0.66
âĹ¼
-0.66
ezvous
-0.65
PORT
-0.65
idas
-0.65
POSITIVE LOGITS
udge
1.20
umpy
1.18
udging
1.16
aciously
1.15
iffin
1.07
inder
1.01
iddle
1.00
ained
1.00
iddles
0.98
ubb
0.98
Activations Density 0.009%