INDEX
Explanations
phrases related to measurements or comparisons
references to measurement scales or frameworks
New Auto-Interp
Negative Logits
esson
-0.79
uala
-0.75
vous
-0.72
hiro
-0.69
olulu
-0.67
unal
-0.67
WOR
-0.64
nor
-0.63
èĥ
-0.63
selves
-0.63
POSITIVE LOGITS
scale
1.07
scales
0.90
Scale
0.81
scale
0.77
itized
0.76
invari
0.75
replica
0.75
craft
0.72
enter
0.67
scaled
0.65
Activations Density 0.009%