INDEX
Explanations
instances of humor or light-hearted communication
New Auto-Interp
Negative Logits
леж
-0.16
executable
-0.15
oogle
-0.14
Conserv
-0.14
Frank
-0.14
frank
-0.14
forb
-0.14
laps
-0.14
Frank
-0.13
Cruc
-0.13
POSITIVE LOGITS
Seriously
0.26
Seriously
0.23
seriously
0.23
kidding
0.23
actually
0.19
Actually
0.19
Actually
0.18
actually
0.17
serious
0.16
cla
0.15
Activations Density 0.077%