INDEX
Explanations
phrases that emphasize positivity or high quality adjectives
New Auto-Interp
Negative Logits
ton
-0.22
-ton
-0.18
ugo
-0.16
Ton
-0.16
swath
-0.16
tons
-0.16
sampling
-0.16
oyer
-0.16
ton
-0.15
needed
-0.15
POSITIVE LOGITS
cracking
0.26
Joined
0.20
range
0.20
emot
0.19
subsid
0.18
intree
0.18
contrib
0.18
raft
0.18
rethink
0.18
advert
0.18
Activations Density 0.318%