INDEX
Explanations
phrases indicating differences or contrasts between various entities
phrases comparing differences between entities or concepts
New Auto-Interp
Negative Logits
tti
-0.90
merga
-0.78
isphere
-0.75
taboola
-0.69
Limited
-0.69
aley
-0.68
ranked
-0.66
iband
-0.64
Dur
-0.63
bard
-0.63
POSITIVE LOGITS
ours
1.08
ordinary
1.01
theirs
0.93
others
0.92
anything
0.91
other
0.90
yours
0.89
what
0.86
typical
0.83
previous
0.82
Activations Density 0.085%