INDEX
Explanations
the word "that" in various contexts
New Auto-Interp
Negative Logits
itſelf
-0.82
houſe
-0.77
Efq
-0.77
Sega
-0.74
hierogly
-0.73
Shaksp
-0.71
Majefty
-0.70
Josephus
-0.69
himſelf
-0.69
pleaſure
-0.69
POSITIVE LOGITS
the
1.10
"])
0.90
it
0.83
there
0.77
"]
0.74
)";
0.71
"]);
0.70
"):
0.69
")
0.69
)")
0.69
Activations Density 0.342%