INDEX
Explanations
adjectives related to positive attributes or emotions
the word "super" in various contexts
New Auto-Interp
Negative Logits
Seym
-0.70
edIn
-0.68
Reloaded
-0.67
llers
-0.66
Anat
-0.65
Franks
-0.64
Muse
-0.63
Granger
-0.63
Tud
-0.63
Sending
-0.62
POSITIVE LOGITS
visor
1.15
imposed
1.10
nova
1.07
visory
0.96
powers
0.90
cedes
0.89
charged
0.89
charg
0.86
computer
0.86
iour
0.84
Activations Density 0.012%