INDEX
Explanations
instances where the audience is directly addressed
references to the audience or reader
New Auto-Interp
Negative Logits
ayne
-0.69
Paddock
-0.68
Called
-0.68
Ī
-0.67
Cance
-0.67
Marcos
-0.64
Mecca
-0.63
Guam
-0.62
Parameters
-0.62
Chap
-0.61
POSITIVE LOGITS
guys
1.17
tub
1.10
know
0.89
yourselves
0.89
're
0.82
hei
0.82
endi
0.79
RS
0.77
filthy
0.76
gentlemen
0.75
Activations Density 0.055%