INDEX
Explanations
statements expressing opinions or beliefs
New Auto-Interp
Negative Logits
themselves
-0.18
должно
-0.17
yourselves
-0.14
raud
-0.14
ubat
-0.14
âĨĴ↵↵
-0.14
Their
-0.14
ÑĢавно
-0.14
Ñĩила
-0.14
their
-0.14
POSITIVE LOGITS
himself
0.75
his
0.52
Himself
0.45
his
0.42
ä»ĸçļĦ
0.36
ÙĨÙ쨳Ùĩ
0.34
seinem
0.32
zijn
0.32
His
0.31
jeho
0.30
Activations Density 1.820%