INDEX
Explanations
pertinent information related to the structure and delivery of jokes.
New Auto-Interp
Negative Logits
VERY
0.33
VERY
0.31
Non
0.30
nontrivial
0.30
Especially
0.30
Очень
0.28
NON
0.28
Questa
0.28
Quinta
0.28
0.27
POSITIVE LOGITS
단순히
0.55
merely
0.54
mere
0.46
只是
0.43
แค่
0.39
passively
0.36
просто
0.35
Mere
0.35
simplemente
0.35
semplicemente
0.35
Activations Density 1.524%