INDEX
Explanations
references to numerical values, particularly the word "ten"
New Auto-Interp
Negative Logits
time
-0.20
side
-0.20
tica
-0.18
WithOptions
-0.17
tiler
-0.17
tega
-0.17
tight
-0.17
athers
-0.17
ockets
-0.16
tempts
-0.16
POSITIVE LOGITS
acious
0.35
ancy
0.33
acity
0.32
ured
0.31
ement
0.30
ets
0.29
anted
0.28
ancies
0.28
ure
0.26
ements
0.26
Activations Density 0.009%