INDEX
Explanations
moments when content is informative or thought-provoking
New Auto-Interp
Negative Logits
cept
-0.15
uts
-0.14
å¦Ļ
-0.14
addir
-0.13
forgettable
-0.13
kapat
-0.13
isans
-0.13
ores
-0.12
imento
-0.12
ilog
-0.12
POSITIVE LOGITS
Feel
0.47
feel
0.47
Feel
0.47
feel
0.42
Enjoy
0.34
Enjoy
0.32
enjoy
0.31
please
0.30
hope
0.30
Please
0.30
Activations Density 0.336%