INDEX
Explanations
emotional reactions and interpersonal dynamics
New Auto-Interp
Negative Logits
embar
-0.24
(
-0.17
[
-0.17
“[
-0.16
perhaps
-0.16
ocator
-0.16
![
-0.15
perhaps
-0.15
Paren
-0.15
rac
-0.15
POSITIVE LOGITS
fucking
0.25
fuck
0.25
fucked
0.23
fucks
0.23
asshole
0.22
cazzo
0.20
assh
0.20
fuck
0.20
Fuck
0.20
tonight
0.19
Activations Density 0.047%