INDEX
Explanations
phrases that indicate functions or roles served by subjects in various contexts
New Auto-Interp
Negative Logits
907
-0.19
assa
-0.16
ATS
-0.14
uracy
-0.14
лаж
-0.14
rani
-0.14
ASM
-0.14
uco
-0.14
legate
-0.14
711
-0.14
POSITIVE LOGITS
serve
0.23
serves
0.23
functioning
0.19
erves
0.19
function
0.18
serving
0.17
served
0.17
vides
0.17
purposes
0.16
ä½ľ
0.16
Activations Density 0.081%