INDEX
Explanations
patterns of comparison and equivalence in descriptions
New Auto-Interp
Negative Logits
similarly
-0.87
similar
-0.77
Similarly
-0.75
Similarly
-0.74
Similar
-0.72
Similar
-0.69
similar
-0.68
Viited
-0.67
SIMILAR
-0.64
calendriers
-0.64
POSITIVE LOGITS
exact
0.86
ſame
0.82
sane
0.82
sam
0.82
self
0.79
же
0.78
zelf
0.78
sae
0.77
samym
0.76
saine
0.75
Activations Density 0.145%