INDEX
Explanations
adjectives related to weakening or reducing something
words related to reducing or hindering something
New Auto-Interp
Negative Logits
boldly
-0.68
prow
-0.63
americ
-0.60
beware
-0.59
wisely
-0.59
eyed
-0.58
wont
-0.57
snail
-0.57
Scand
-0.57
Brit
-0.57
POSITIVE LOGITS
uate
0.94
uates
0.87
utive
0.82
Increase
0.81
activate
0.79
chieve
0.78
ior
0.75
uating
0.75
ependent
0.74
uce
0.73
Activations Density 0.091%