INDEX
Explanations
phrases related to personal experiences and sentiments
New Auto-Interp
Negative Logits
"~
-0.20
“
-0.20
reportedly
-0.20
“â̦
-0.20
"...
-0.19
often
-0.19
"/
-0.18
"
-0.18
"..
-0.18
recently
-0.17
POSITIVE LOGITS
thing
0.34
[
0.33
whole
0.31
guy
0.28
little
0.26
nice
0.25
really
0.24
whole
0.23
-[
0.23
guys
0.23
Activations Density 0.446%