INDEX
Explanations
words related to standards or preferred methods of operation
New Auto-Interp
Negative Logits
igh
-0.82
iture
-0.78
ividual
-0.78
inately
-0.76
iosyncr
-0.74
atto
-0.73
iry
-0.72
milo
-0.72
icious
-0.71
ifer
-0.68
POSITIVE LOGITS
fare
0.77
Europeans
0.72
ward
0.71
soever
0.70
forward
0.68
THEY
0.68
they
0.67
backs
0.65
finding
0.65
norm
0.65
Activations Density 0.019%