INDEX
Explanations
phrases that indicate the beginning of sentences
New Auto-Interp
Negative Logits
ght
-0.16
erd
-0.15
ccion
-0.15
variants
-0.15
elah
-0.15
erald
-0.15
berra
-0.15
athan
-0.14
ero
-0.14
rt
-0.14
POSITIVE LOGITS
last
0.18
present
0.18
first
0.17
uly
0.17
SAME
0.17
woord
0.16
ventus
0.15
testing
0.15
wood
0.15
contr
0.15
Activations Density 0.054%