INDEX
Explanations
mentions of named individuals in the context of news or events
occurrences of a specific character or placeholder token indicating a section break or new topic
New Auto-Interp
Negative Logits
Dickinson
-0.67
PLA
-0.64
CRC
-0.64
substitutes
-0.63
Yel
-0.62
totality
-0.62
unfocusedRange
-0.62
CTR
-0.62
CoC
-0.61
sinks
-0.61
POSITIVE LOGITS
ceans
1.34
lymp
1.31
vernight
1.27
culus
1.27
scill
1.26
mega
1.17
liv
1.13
rient
1.12
tto
1.12
oops
1.09
Activations Density 0.030%