INDEX
Explanations
actions indicating agency, particularly in contexts of ability or causing events
New Auto-Interp
Negative Logits
are
-0.64
сюда
-0.62
*/
-0.62
is
-0.61
himo
-0.61
keeps
-0.56
ignores
-0.55
makes
-0.55
will
-0.54
*/)
-0.54
POSITIVE LOGITS
wasn
0.94
Wasn
0.88
Was
0.87
wasn
0.86
weren
0.86
było
0.86
Wasn
0.85
weren
0.81
was
0.81
was
0.79
Activations Density 1.365%