INDEX
Explanations
conjunctions and phrases indicating a connection or addition
New Auto-Interp
Negative Logits
Ïģιο
-0.15
ardo
-0.15
æ³
-0.15
ynos
-0.14
overall
-0.14
atrix
-0.14
xon
-0.14
vester
-0.14
verts
-0.14
557
-0.14
POSITIVE LOGITS
ATEST
0.15
raman
0.14
ensen
0.14
yses
0.14
lt
0.14
olt
0.14
angered
0.13
à¥Ĥत
0.13
Ñĩив
0.13
amph
0.13
Activations Density 0.072%