INDEX
Explanations
words that convey emotions or strong emphasis
occurrences of the word "words."
New Auto-Interp
Negative Logits
DERR
-0.80
olls
-0.75
izo
-0.72
ramid
-0.71
vy
-0.68
ño
-0.67
awaru
-0.64
roid
-0.62
cumbers
-0.62
millenn
-0.62
POSITIVE LOGITS
mith
1.44
spoken
0.99
uttered
0.96
terday
0.87
words
0.86
poons
0.85
speak
0.81
aloud
0.81
words
0.79
writers
0.79
Activations Density 0.021%