INDEX
Explanations
technical details and measurements
New Auto-Interp
Negative Logits
goose
-0.60
eanor
-0.60
unpre
-0.58
opausal
-0.57
Scully
-0.56
Lauder
-0.56
Sans
-0.56
Leap
-0.56
gro
-0.56
andise
-0.55
POSITIVE LOGITS
fig
0.72
figure
0.66
Ele
0.64
reper
0.64
ĥ
0.64
inch
0.63
ĪĴ
0.63
ibu
0.62
wid
0.62
9
0.62
Activations Density 0.250%