INDEX
Explanations
comparisons between different situations or scenarios
comparative phrases emphasizing the notion of similarity
New Auto-Interp
Negative Logits
hiba
-0.89
ulty
-0.86
yrinth
-0.81
iland
-0.78
bard
-0.78
iband
-0.78
acia
-0.75
iple
-0.73
icity
-0.71
ishop
-0.70
POSITIVE LOGITS
lihood
1.43
lier
1.16
liest
1.10
liness
0.88
Nor
0.78
nor
0.74
ours
0.72
erous
0.71
able
0.70
anything
0.70
Activations Density 0.040%