INDEX
Explanations
phrases indicating degree or extent
phrases that discuss varying degrees of impact or influence
New Auto-Interp
Negative Logits
tty
-0.67
brook
-0.66
Scholar
-0.65
reth
-0.64
Shal
-0.64
@#&
-0.63
rium
-0.63
SPR
-0.62
«
-0.62
RY
-0.62
POSITIVE LOGITS
restitution
0.70
mma
0.70
approximation
0.69
uracy
0.68
iece
0.68
olit
0.66
esm
0.62
extent
0.62
tenance
0.61
iter
0.61
Activations Density 0.023%