INDEX
Explanations
frequent uses of the word "the."
New Auto-Interp
Negative Logits
midst
-0.15
rozh
-0.14
ä¿
-0.14
ovu
-0.14
á»§a
-0.13
uent
-0.13
agli
-0.13
ivities
-0.13
iska
-0.13
ertz
-0.13
POSITIVE LOGITS
question
0.29
only
0.28
focus
0.25
mere
0.24
result
0.24
sheer
0.23
fact
0.23
aim
0.23
lack
0.23
odds
0.23
Activations Density 1.004%