INDEX
Explanations
phrases expressing improvement or superiority
New Auto-Interp
Negative Logits
phabet
-0.70
pex
-0.65
clerosis
-0.64
urses
-0.62
regate
-0.61
gemony
-0.61
iasm
-0.61
estones
-0.61
mad
-0.59
pson
-0.59
POSITIVE LOGITS
than
1.12
suited
1.08
than
0.81
served
0.80
behaved
0.78
situated
0.77
Than
0.72
luaj
0.72
safest
0.71
served
0.67
Activations Density 0.082%