INDEX
Explanations
instances of the word "that" used in various contexts
New Auto-Interp
Negative Logits
Means
-0.20
Means
-0.18
pery
-0.17
ãĤ¤ãĥ¤
-0.17
_means
-0.17
means
-0.16
rud
-0.15
ứng
-0.15
means
-0.15
deaux
-0.14
POSITIVE LOGITS
way
0.31
away
0.29
aways
0.24
direction
0.23
away
0.21
-away
0.19
-a
0.19
-way
0.19
why
0.18
Away
0.17
Activations Density 0.014%