INDEX
Explanations
words related to destruction or strong actions
the end of sentences or paragraphs in the text
New Auto-Interp
Negative Logits
Niet
-0.69
Frie
-0.61
Vaugh
-0.59
Pru
-0.52
enegger
-0.52
Leilan
-0.52
Berm
-0.52
Moroc
-0.50
undermin
-0.50
corrid
-0.50
POSITIVE LOGITS
\":
0.60
imum
0.49
ciples
0.48
phrine
0.48
cised
0.47
pedia
0.47
DragonMagazine
0.47
clinton
0.47
pret
0.46
âĦ¢:
0.46
Activations Density 0.390%