INDEX
Explanations
phrases indicating a range or approximation
phrases that indicate a range of conditions or variations
New Auto-Interp
Negative Logits
corridors
-0.56
Monitor
-0.54
CHR
-0.53
challeng
-0.53
Puzz
-0.52
afety
-0.52
horizont
-0.51
Vector
-0.51
verages
-0.51
Traps
-0.50
POSITIVE LOGITS
nery
1.10
less
0.94
leans
0.93
nam
0.86
gin
0.86
phans
0.84
acular
0.84
chid
0.82
acle
0.80
fewer
0.77
Activations Density 0.018%