INDEX
Explanations
instances of similarities between different entities or concepts
references to similarities or comparisons between different subjects
New Auto-Interp
Negative Logits
FT
-0.69
adish
-0.69
Sky
-0.67
gress
-0.67
BG
-0.64
Anth
-0.63
bern
-0.63
ved
-0.63
til
-0.62
Bus
-0.61
POSITIVE LOGITS
similarities
1.17
lihood
1.07
resemblance
0.96
similarity
0.88
resemb
0.88
parallels
0.87
twins
0.87
xual
0.85
DragonMagazine
0.81
ibilities
0.78
Activations Density 0.019%