INDEX
Explanations
phrases related to comparison or similarity
phrases related to matching or equivalence
New Auto-Interp
Negative Logits
hement
-0.89
cation
-0.73
uler
-0.72
cember
-0.71
Discuss
-0.70
duc
-0.67
trave
-0.66
SO
-0.66
ciplinary
-0.64
Discuss
-0.63
POSITIVE LOGITS
amorph
0.85
expectations
0.77
sticks
0.75
tle
0.74
stick
0.72
theirs
0.71
enance
0.70
up
0.69
ours
0.68
yours
0.67
Activations Density 0.035%