INDEX
Explanations
expressions of strong opinions and emotions
New Auto-Interp
Negative Logits
202
-0.17
https
-0.17
ðŁij
-0.15
ï¸ı
-0.15
https
-0.15
climate
-0.15
iche
-0.15
hus
-0.15
Patreon
-0.14
ulled
-0.14
POSITIVE LOGITS
Props
0.17
man
0.16
(:
0.15
:]↵
0.15
hes
0.15
loved
0.15
wait
0.14
åѦéĻ¢
0.14
ãĤĵãģª
0.14
props
0.14
Activations Density 0.185%