INDEX
Explanations
instances where something is considered valid or qualified
New Auto-Interp
Negative Logits
hedon
-0.84
xual
-0.65
Kut
-0.63
stricken
-0.60
hedral
-0.60
Mania
-0.58
Sisters
-0.58
Plants
-0.57
ynthesis
-0.56
lest
-0.56
POSITIVE LOGITS
ating
1.32
ators
1.30
ator
1.21
ations
1.05
ates
1.04
ated
0.94
alties
0.92
ifiers
0.91
atory
0.89
iation
0.88
Activations Density 0.013%