INDEX
Explanations
the repetitive use of the word "just" in various contexts
New Auto-Interp
Negative Logits
juuri
-0.80
just
-0.77
právě
-0.73
justru
-0.72
only
-0.71
hanya
-0.69
только
-0.68
dopiero
-0.67
רק
-0.64
тільки
-0.63
POSITIVE LOGITS
plain
0.77
Simplemente
0.62
Simply
0.60
plain
0.57
Simply
0.57
Plain
0.54
इत
0.51
simply
0.48
straight
0.47
Plain
0.46
Activations Density 0.179%