INDEX
Explanations
sentences, especially those that come towards the end of a passage or include transitions
occurrences of the word "sentence."
New Auto-Interp
Negative Logits
BLIC
-0.85
ikarp
-0.85
rists
-0.79
jri
-0.79
eus
-0.75
rament
-0.75
IED
-0.75
roxy
-0.74
enhagen
-0.74
owered
-0.73
POSITIVE LOGITS
uttered
1.18
sentences
1.02
phrases
0.95
paragraphs
0.95
sentence
0.92
quotation
0.89
mith
0.84
paragraph
0.83
text
0.83
comprehension
0.83
Activations Density 0.054%