INDEX
Explanations
phrases describing the uniqueness or differences between various entities
instances and phrases indicating comparisons or contrasts
New Auto-Interp
Negative Logits
quished
-0.71
then
-0.71
ileaks
-0.69
shalt
-0.68
sin
-0.67
finished
-0.65
DragonMagazine
-0.64
pleted
-0.64
HAM
-0.63
LOG
-0.63
POSITIVE LOGITS
terms
1.48
regards
1.24
asm
1.18
regard
1.10
Terms
0.98
determining
0.95
versely
0.92
distinguishing
0.90
respect
0.89
spite
0.89
Activations Density 0.180%