INDEX
Explanations
comparisons and differences between different entities
New Auto-Interp
Negative Logits
authorized
-0.74
Bay
-0.64
ãĥİ
-0.61
arse
-0.60
achment
-0.58
uler
-0.58
bean
-0.58
lear
-0.57
utters
-0.57
noon
-0.57
POSITIVE LOGITS
favorably
0.99
apples
0.86
Compare
0.81
favour
0.77
isons
0.76
between
0.76
xual
0.75
sexes
0.73
compare
0.72
comparison
0.72
Activations Density 0.464%