INDEX
Explanations
phrases related to originality or uniqueness
New Auto-Interp
Negative Logits
avorite
-0.78
pherd
-0.77
tics
-0.67
wat
-0.64
gregation
-0.62
endas
-0.61
athed
-0.61
mania
-0.61
morph
-0.61
âĶģ
-0.61
POSITIVE LOGITS
place
0.98
baseman
0.89
installment
0.86
millennium
0.85
decade
0.83
iteration
0.82
instance
0.82
inning
0.81
phase
0.78
edition
0.78
Activations Density 0.064%