INDEX
Explanations
words related to negative connotations or controversies
words related to deception or harmful intentions
New Auto-Interp
Negative Logits
schild
-0.75
bley
-0.68
bane
-0.66
Joy
-0.65
ĸļ
-0.62
beginnings
-0.61
dove
-0.59
boxing
-0.59
ById
-0.58
enegger
-0.56
POSITIVE LOGITS
ctory
0.98
haps
0.83
cia
0.80
etric
0.77
enza
0.70
cially
0.70
rency
0.69
xia
0.69
iac
0.68
BILITIES
0.67
Activations Density 0.059%