INDEX
Explanations
mentions of numerical values and measurements
capitalized names and significant phrases
New Auto-Interp
Negative Logits
metic
-0.86
authoritative
-0.77
ÂŃ
-0.76
inspected
-0.75
ALEC
-0.74
Airbnb
-0.74
laboratories
-0.73
Layer
-0.73
evaluated
-0.72
BART
-0.72
POSITIVE LOGITS
and
1.43
stre
1.43
nor
1.41
him
1.41
requ
1.40
kn
1.40
cond
1.39
from
1.38
comm
1.38
while
1.38
Activations Density 0.255%