INDEX
Explanations
words related to negative emotions or situations
negative prefixes attached to words conveying discontent or difficulties
New Auto-Interp
Negative Logits
anwhile
-0.79
Trilogy
-0.77
Nanto
-0.77
Defenders
-0.75
BOOK
-0.74
Kings
-0.73
enegger
-0.72
sonian
-0.72
cair
-0.70
fman
-0.66
POSITIVE LOGITS
unh
0.87
ashed
0.87
ishable
0.85
oly
0.83
rep
0.80
idden
0.79
wash
0.78
scrut
0.78
undone
0.75
unp
0.74
Activations Density 0.005%