INDEX
Explanations
phrases indicating legal or procedural concepts
New Auto-Interp
Negative Logits
kova
-0.19
pur
-0.16
esub
-0.15
itto
-0.14
製
-0.14
reur
-0.14
.capture
-0.14
pur
-0.14
åĢĴ
-0.14
wat
-0.14
POSITIVE LOGITS
iar
0.16
adh
0.15
cih
0.14
oss
0.14
Ïĥα
0.14
uai
0.14
ingly
0.14
Grund
0.13
SPA
0.13
ucc
0.13
Activations Density 0.004%