INDEX
Explanations
direct speech dialogues
New Auto-Interp
Negative Logits
imprint
-0.45
liv
-0.45
horizont
-0.44
muse
-0.43
sacrific
-0.42
invention
-0.41
motif
-0.41
pyramid
-0.41
commem
-0.41
tremend
-0.41
POSITIVE LOGITS
Ķ
0.56
ï¸ı
0.55
ľ
0.55
Ļ
0.51
CNN
0.50
¦
0.50
ONSORED
0.49
Specifically
0.49
_>
0.49
said
0.47
Activations Density 0.580%