INDEX
Explanations
phrases that reference lists or structured formats, particularly in relation to "line" items
New Auto-Interp
Negative Logits
ness
-0.21
mol
-0.17
name
-0.17
nt
-0.17
teen
-0.16
lick
-0.16
marine
-0.16
lings
-0.16
ly
-0.16
imb
-0.16
POSITIVE LOGITS
arity
0.46
aments
0.37
ament
0.35
ups
0.33
ages
0.28
age
0.25
haul
0.25
amenti
0.25
-up
0.25
amientos
0.24
Activations Density 0.080%