INDEX
Explanations
terms related to comparisons between different entities or categories
phrases that denote comparisons or contrasts generally indicating levels of inequality
New Auto-Interp
Negative Logits
akeru
-0.86
istries
-0.85
arios
-0.76
liga
-0.73
ãĤµ
-0.62
thanking
-0.62
ONEY
-0.61
Lisp
-0.60
âĢİ
-0.58
Consent
-0.58
POSITIVE LOGITS
slightest
0.73
ones
0.70
anymore
0.70
outer
0.69
predecessors
0.68
rivals
0.67
counterparts
0.66
altogether
0.65
itself
0.65
equals
0.64
Activations Density 0.492%