INDEX
Explanations
names, particularly with the pattern "-sen" at the end
the word "sen" followed by any other token
New Auto-Interp
Negative Logits
Pwr
-0.73
tolerance
-0.73
hitch
-0.65
heels
-0.64
tails
-0.63
casing
-0.63
chromosome
-0.63
Adin
-0.62
grounding
-0.62
anticipation
-0.62
POSITIVE LOGITS
pai
1.59
iors
1.13
ior
1.09
iture
1.05
ault
1.00
vironment
0.99
egal
0.98
seless
0.97
ocide
0.96
escent
0.96
Activations Density 0.009%