INDEX
Explanations
phrases with the word "which" followed by a subject or object
occurrences of delimiters or punctuation, specifically the end of text markers
New Auto-Interp
Negative Logits
Vaugh
-0.68
Seym
-0.67
Niet
-0.60
Instead
-0.60
Darling
-0.58
Fram
-0.57
Tokens
-0.57
Frie
-0.56
disadvant
-0.55
Daddy
-0.54
POSITIVE LOGITS
zbollah
0.82
ersive
0.70
imes
0.62
awaits
0.61
ims
0.60
awaited
0.60
rejo
0.60
embed
0.60
;;;;;;;;;;;;
0.59
usalem
0.59
Activations Density 0.097%