INDEX
Explanations
phrases that suggest alternatives or options
New Auto-Interp
Negative Logits
ires
-0.68
usercontent
-0.62
DEF
-0.60
>>
-0.58
Leilan
-0.58
SEE
-0.57
scrib
-0.57
edu
-0.56
achelor
-0.55
Required
-0.55
POSITIVE LOGITS
acle
0.89
chard
0.88
ifice
0.88
nam
0.84
chid
0.79
gin
0.78
lando
0.77
ific
0.73
phr
0.72
nery
0.71
Activations Density 0.025%