INDEX
Explanations
words related to reduction or decrease
terms related to reduction or decline
New Auto-Interp
Negative Logits
rieg
-0.72
raid
-0.71
ullivan
-0.69
sol
-0.64
RA
-0.64
ORE
-0.63
lished
-0.63
dor
-0.62
nell
-0.62
ramid
-0.61
POSITIVE LOGITS
proport
0.81
dimin
0.80
utive
0.78
diminish
0.77
Jagu
0.74
utical
0.71
inished
0.71
ãĤµãĥ¼ãĥĨãĤ£ãĥ¯ãĥ³
0.71
numb
0.69
itized
0.68
Activations Density 0.023%