INDEX
Explanations
references to purpose or intentionality in various contexts
New Auto-Interp
Negative Logits
ish
-0.18
áj
-0.17
rael
-0.17
redo
-0.16
å¯
-0.16
orna
-0.16
ey
-0.15
ä
-0.15
eding
-0.15
roller
-0.15
POSITIVE LOGITS
fully
0.36
ful
0.33
fulness
0.27
FUL
0.27
-built
0.26
lessly
0.21
FULL
0.18
full
0.16
tw
0.16
quoi
0.16
Activations Density 0.018%