INDEX
Explanations
references to events and changes over time
New Auto-Interp
Negative Logits
eux
-0.16
него
-0.14
Å¥
-0.14
him
-0.14
lui
-0.14
наÑĩе
-0.14
Yourself
-0.14
/***/
-0.14
herself
-0.14
ниÑħ
-0.13
POSITIVE LOGITS
things
0.36
nothing
0.33
many
0.31
everything
0.31
certain
0.29
there
0.29
additional
0.28
none
0.27
anything
0.26
lots
0.26
Activations Density 0.265%