INDEX
Explanations
personal pronouns associated with specific actions or qualities
pronouns referring to the reader or speaker
New Auto-Interp
Negative Logits
hap
-0.67
Wake
-0.64
Hok
-0.59
obstruction
-0.57
ãĤ¨
-0.57
endif
-0.57
Seah
-0.57
Ballard
-0.56
Sund
-0.55
detail
-0.55
POSITIVE LOGITS
wont
0.75
'll
0.73
cius
0.71
izons
0.69
surely
0.66
olit
0.66
erd
0.66
taboola
0.65
pton
0.65
sembly
0.65
Activations Density 0.179%