INDEX
Explanations
nouns related to specific events or professions
statements about personal experiences and emotions
New Auto-Interp
Negative Logits
Furthermore
-0.80
éŃĶ
-0.79
etheless
-0.76
Moreover
-0.76
currently
-0.74
Indeed
-0.73
Nevertheless
-0.72
士
-0.72
é¾įå
-0.72
DATA
-0.72
POSITIVE LOGITS
,'"
1.02
fuckin
0.93
,"
0.92
,''
0.90
happiest
0.89
love
0.89
"—
0.88
[
0.86
gonna
0.83
daddy
0.82
Activations Density 0.659%