INDEX
Explanations
words related to positive impact or benefit
the concept of "good" in various contexts
New Auto-Interp
Negative Logits
eters
-0.79
agos
-0.74
ptin
-0.73
hod
-0.70
âĹ¼
-0.67
Pavilion
-0.67
pper
-0.67
kson
-0.66
ocene
-0.66
gow
-0.65
POSITIVE LOGITS
enough
1.10
reads
1.05
deed
0.95
intentions
0.94
deeds
0.93
Samar
0.92
luck
0.91
sword
0.89
NESS
0.81
luck
0.81
Activations Density 0.056%