INDEX
Explanations
instances of laughter and amusement
states or concepts
New Auto-Interp
Negative Logits
([
-0.54
()][
-0.51
Fuck
-0.48
nuke
-0.47
chränk
-0.46
Sink
-0.45
Houſe
-0.45
Philipp
-0.44
FUCK
-0.43
fuck
-0.43
POSITIVE LOGITS
laughter
1.62
Laughter
1.38
Laughter
1.38
laughter
1.22
LAUGHTER
0.81
UGHTER
0.72
applause
0.70
Applause
0.66
betweenstory
0.65
tertawa
0.64
Activations Density 0.002%