INDEX
Explanations
comparisons between objects or entities
words related to resemblance or similarity
New Auto-Interp
Negative Logits
ourse
-0.79
alloc
-0.78
imb
-0.75
arta
-0.71
FT
-0.71
load
-0.70
alt
-0.70
gard
-0.69
mouth
-0.69
deal
-0.69
POSITIVE LOGITS
lihood
1.71
lier
0.94
liest
0.85
likeness
0.82
ours
0.81
liness
0.78
awei
0.73
resembling
0.72
lifeless
0.71
theirs
0.70
Activations Density 0.032%