INDEX
Explanations
instances of the word "like" indicating comparison or similarity
New Auto-Interp
Negative Logits
uzey
-0.16
argas
-0.15
iry
-0.14
arger
-0.14
urus
-0.14
hung
-0.14
asje
-0.13
alue
-0.13
ç¾
-0.13
ÃŃf
-0.13
POSITIVE LOGITS
stoff
0.15
esson
0.15
abcdefghijkl
0.15
_utilities
0.15
thane
0.15
ests
0.15
frm
0.14
ĸī
0.14
andes
0.14
forme
0.14
Activations Density 0.014%