INDEX
Explanations
similarities between different entities or concepts
references to similarities and comparisons between subjects
New Auto-Interp
Negative Logits
bern
-0.74
bay
-0.71
til
-0.69
Sky
-0.66
boarding
-0.62
bers
-0.61
mans
-0.60
FT
-0.60
helicop
-0.60
oats
-0.60
POSITIVE LOGITS
similarities
0.99
lihood
0.98
between
0.87
resemblance
0.86
DragonMagazine
0.84
alities
0.84
twins
0.84
Between
0.80
alogy
0.79
xual
0.78
Activations Density 0.033%