INDEX
Explanations
instances of the word "compare" and its variations
New Auto-Interp
Negative Logits
çĦ¶
-0.19
ereotype
-0.17
anna
-0.16
amping
-0.15
.nz
-0.15
еÑİ
-0.14
ough
-0.14
elt
-0.14
ivia
-0.14
imore
-0.14
POSITIVE LOGITS
apples
0.27
favor
0.22
unfavor
0.22
favor
0.22
favour
0.20
between
0.19
rios
0.18
isons
0.18
notes
0.18
ãģ¹
0.17
Activations Density 0.023%