INDEX
Explanations
phrases related to the passing of time
references to specific sections or excerpts in a text
New Auto-Interp
Negative Logits
Tur
-0.72
ocr
-0.71
rice
-0.69
RAW
-0.69
pora
-0.68
iser
-0.68
ches
-0.67
rum
-0.67
isers
-0.66
resid
-0.66
POSITIVE LOGITS
passages
0.98
passage
0.94
phrase
0.82
through
0.75
uality
0.71
forward
0.70
words
0.69
smanship
0.68
Passage
0.68
iru
0.67
Activations Density 0.014%