INDEX
Explanations
words related to negative events or situations, particularly ones involving harm or loss
terms related to death and significant departures
New Auto-Interp
Negative Logits
vier
-0.74
enegger
-0.73
Cola
-0.72
cale
-0.72
isine
-0.69
zzy
-0.67
igmatic
-0.63
ingo
-0.63
ocaly
-0.63
overe
-0.62
POSITIVE LOGITS
of
0.94
thereof
0.92
OF
0.66
Flavoring
0.65
deadline
0.61
lights
0.60
switch
0.60
portion
0.59
shuffle
0.59
ritz
0.58
Activations Density 0.198%