INDEX
Explanations
words and phrases related to correct or ideal practices
New Auto-Interp
Negative Logits
elic
-0.16
-0.16
arine
-0.16
istics
-0.15
IX
-0.15
elder
-0.15
AZY
-0.15
/or
-0.15
rop
-0.15
possible
-0.14
POSITIVE LOGITS
functioning
0.22
proper
0.21
-function
0.20
Proper
0.20
nouns
0.20
ity
0.18
ment
0.17
noun
0.17
fully
0.17
bred
0.17
Activations Density 0.030%