INDEX
Explanations
the phrase "that's" or other variations related to affirmation or emphasis
New Auto-Interp
Negative Logits
tento
-0.13
Ulus
-0.13
ấy
-0.13
thereof
-0.13
erer
-0.13
ursive
-0.12
evils
-0.12
æ²ĥ
-0.12
/non
-0.12
rằng
-0.12
POSITIVE LOGITS
why
0.38
how
0.33
where
0.31
what
0.27
exactly
0.26
precisely
0.26
why
0.26
when
0.24
it
0.24
assuming
0.23
Activations Density 0.087%