INDEX
Explanations
phrases indicating direct interactions or connections
New Auto-Interp
Negative Logits
инкÑĥ
-0.15
klad
-0.15
grace
-0.14
ingen
-0.14
.metro
-0.14
однов
-0.14
ubat
-0.14
richt
-0.14
traps
-0.13
aret
-0.13
POSITIVE LOGITS
directly
0.35
direct
0.21
Direct
0.18
Direct
0.17
enville
0.17
.direct
0.17
diret
0.17
DIRECT
0.16
zeitig
0.16
irect
0.16
Activations Density 0.032%