INDEX
Explanations
pronouns followed by verbs
repeated pronouns, specifically "it"
New Auto-Interp
Negative Logits
Exit
-0.65
Thousand
-0.64
Benny
-0.63
Gran
-0.62
Dome
-0.61
Friend
-0.61
Dayton
-0.60
Colonial
-0.60
stead
-0.58
Advent
-0.58
POSITIVE LOGITS
asca
0.90
displayText
0.86
self
0.82
unes
0.77
urses
0.75
chy
0.75
alian
0.74
intends
0.72
chwitz
0.72
indo
0.71
Activations Density 0.095%