INDEX
Explanations
words that stand out or are emphasized in a text
mention of the word "words" in various contexts
New Auto-Interp
Negative Logits
DERR
-0.75
izo
-0.73
ramid
-0.72
roxy
-0.69
enture
-0.67
olls
-0.65
cumbers
-0.65
ño
-0.64
romeda
-0.63
notor
-0.63
POSITIVE LOGITS
mith
1.54
spoken
1.11
uttered
1.03
aloud
0.89
speak
0.87
press
0.86
words
0.84
poons
0.82
sworth
0.82
words
0.78
Activations Density 0.023%