INDEX
Explanations
phrases where someone is speaking or expressing an opinion
phrases that indicate attribution or references to statements made by individuals
New Auto-Interp
Negative Logits
inary
-0.77
eneg
-0.73
rats
-0.73
acts
-0.72
leases
-0.71
venth
-0.71
ãĥĻ
-0.70
cffffcc
-0.69
rel
-0.67
apters
-0.66
POSITIVE LOGITS
Jonathan
0.89
Joyce
0.87
David
0.87
Pamela
0.86
Laura
0.84
Katie
0.83
Jon
0.83
Ian
0.82
Diane
0.81
Polly
0.81
Activations Density 0.032%