INDEX
Explanations
words related to positive attributes or actions, specifically focusing on "good"
phrases emphasizing the concept of "good."
New Auto-Interp
Negative Logits
ĸļ
-0.76
olate
-0.76
eteria
-0.73
lets
-0.69
kson
-0.68
Sturgeon
-0.67
eters
-0.67
apse
-0.66
otom
-0.66
ulous
-0.65
POSITIVE LOGITS
intentions
1.26
deeds
1.19
deed
1.14
Samar
1.10
ol
1.05
luck
1.00
reads
1.00
manners
0.95
fortune
0.93
die
0.93
Activations Density 0.084%