INDEX
Explanations
prepositions and conjunctions suggesting relationships or conditions
New Auto-Interp
Negative Logits
gether
-0.33
oriously
-0.29
ably
-0.28
ingly
-0.27
sembled
-0.27
leep
-0.27
wards
-0.27
pects
-0.26
olated
-0.25
oretical
-0.25
POSITIVE LOGITS
sy
0.22
eri
0.22
ses
0.22
sip
0.21
ml
0.21
dl
0.21
tem
0.20
s
0.19
cel
0.19
ker
0.19
Activations Density 0.118%