INDEX
Explanations
phrases or sentences indicating similarity or comparison
phrases indicating similarity or comparison
New Auto-Interp
Negative Logits
Added
-0.68
whe
-0.67
EMENT
-0.64
esses
-0.64
hess
-0.61
escription
-0.60
ourse
-0.60
erity
-0.60
nonetheless
-0.60
azz
-0.59
POSITIVE LOGITS
lihood
0.97
ours
0.97
oxide
0.78
theirs
0.74
ptions
0.69
angular
0.68
lier
0.66
invoke
0.65
agate
0.64
chronological
0.64
Activations Density 0.069%