INDEX
Explanations
differences or comparisons between different entities
comparative phrases that highlight distinctions between various entities or categories
New Auto-Interp
Negative Logits
ueller
-0.71
ossession
-0.69
``
-0.69
earance
-0.68
oke
-0.65
ony
-0.64
aping
-0.62
went
-0.62
anqu
-0.62
onding
-0.61
POSITIVE LOGITS
worldly
0.99
kinds
0.90
types
0.85
facets
0.84
sectors
0.81
mammals
0.80
countries
0.78
implementations
0.77
eras
0.76
categories
0.74
Activations Density 0.106%