INDEX
Explanations
words that express comparisons and contrasts in various contexts
New Auto-Interp
Negative Logits
esh
-0.14
literal
-0.14
weekly
-0.13
textual
-0.13
idal
-0.13
-0.13
permutations
-0.13
modulo
-0.13
ypical
-0.13
gratis
-0.13
POSITIVE LOGITS
problem
0.16
маз
0.16
job
0.16
thing
0.15
story
0.15
thing
0.15
solution
0.15
problema
0.15
Olympics
0.15
ÂĿ
0.15
Activations Density 0.661%