INDEX
Explanations
instances of the word "it," indicating a focus on referring to subjects or objects previously mentioned
New Auto-Interp
Negative Logits
acz
-0.18
erk
-0.16
Ñĩим
-0.15
usch
-0.14
-sidebar
-0.14
auen
-0.14
atter
-0.14
imos
-0.14
Happ
-0.14
happen
-0.13
POSITIVE LOGITS
remains
0.21
Remain
0.19
remain
0.19
seems
0.17
wouldn
0.17
Seems
0.16
remain
0.16
safe
0.16
beh
0.15
leaves
0.15
Activations Density 0.066%