INDEX
Explanations
phrases indicating contrast or contradiction
phrases indicating contrasts or antonyms
New Auto-Interp
Negative Logits
lished
-0.83
zeb
-0.82
ufact
-0.80
Mush
-0.76
Annotations
-0.76
uala
-0.75
ondon
-0.72
atche
-0.71
beans
-0.69
anches
-0.68
POSITIVE LOGITS
approach
0.76
icter
0.74
minded
0.74
etheless
0.74
scenario
0.72
isphere
0.71
gender
0.70
side
0.70
osite
0.69
=#
0.68
Activations Density 0.020%