INDEX
Explanations
phrases related to revealing information, especially plot twists and spoilers
phrases related to habits and routines
New Auto-Interp
Negative Logits
ayne
-0.69
hran
-0.69
apore
-0.69
lez
-0.69
english
-0.66
>>
-0.66
tonight
-0.64
greg
-0.64
çīĪ
-0.63
chell
-0.61
POSITIVE LOGITS
underdog
0.97
unve
0.87
oneself
0.85
unexpected
0.85
stumble
0.85
headline
0.78
hastily
0.77
triumph
0.76
unexpectedly
0.75
suddenly
0.73
Activations Density 1.222%