INDEX
Explanations
phrases related to procedures, regulations, and official designations
prepositions and various auxiliary words indicating relationships or conditions
New Auto-Interp
Negative Logits
hess
-0.70
reach
-0.67
HL
-0.60
rogens
-0.58
versions
-0.58
occup
-0.56
Manchester
-0.54
ton
-0.54
toler
-0.53
rock
-0.53
POSITIVE LOGITS
etheless
0.98
Ö¼
0.84
ï¸ı
0.83
udder
0.71
apest
0.69
IVERS
0.69
ylon
0.68
terday
0.67
ulhu
0.67
azo
0.65
Activations Density 0.300%