INDEX
Explanations
technical jargon and specialized terms related to research and development
preceding "are"
types of things
New Auto-Interp
Negative Logits
is
-1.03
was
-0.91
has
-0.75
does
-0.65
isn
-0.64
was
-0.62
darstellt
-0.62
Was
-0.61
Is
-0.60
hasn
-0.60
POSITIVE LOGITS
are
2.18
were
1.93
were
1.50
are
1.48
aren
1.48
weren
1.47
WERE
1.42
ARE
1.42
Were
1.30
Are
1.29
Activations Density 1.558%