INDEX
Explanations
instances of the word "When" indicating the start of a narrative or event
New Auto-Interp
Negative Logits
Originally
-0.73
hran
-0.66
war
-0.65
Prediction
-0.64
JJ
-0.62
Bake
-0.61
Py
-0.61
Ho
-0.61
nah
-0.61
worth
-0.60
POSITIVE LOGITS
heartbeat
0.76
�醒
0.76
strate
0.72
examination
0.70
realm
0.69
antically
0.69
digest
0.66
indu
0.65
waking
0.65
龍�
0.65
Activations Density 0.277%