INDEX
Explanations
phrases that indicate a strong or noticeable difference
phrases emphasizing sharp contrasts or significant differences
New Auto-Interp
Negative Logits
hops
-0.85
sembly
-0.74
imester
-0.67
diligently
-0.66
llular
-0.66
loving
-0.65
ipop
-0.65
cffff
-0.62
chance
-0.62
onz
-0.62
POSITIVE LOGITS
contrast
1.21
contrasts
1.19
ly
1.05
departure
0.93
contradiction
0.91
difference
0.88
inequalities
0.88
similarities
0.86
stark
0.83
disparities
0.83
Activations Density 0.103%