INDEX
Explanations
substrings that match particular patterns or structures within words
New Auto-Interp
Negative Logits
å²³
-0.18
owed
-0.17
afari
-0.16
enerator
-0.16
classifier
-0.15
esel
-0.15
oodle
-0.15
ipi
-0.15
edl
-0.15
ibar
-0.15
POSITIVE LOGITS
uncated
0.19
preneur
0.18
ondheim
0.17
ivial
0.17
insic
0.17
actions
0.16
uss
0.16
IBUTE
0.16
ong
0.16
actors
0.16
Activations Density 0.067%