INDEX
Explanations
personal pronouns and possessive pronouns referring to someone else
statements related to blame and accountability
New Auto-Interp
Negative Logits
hess
-0.56
Milton
-0.53
cas
-0.52
NEC
-0.50
Tut
-0.50
Klaus
-0.48
Hammond
-0.48
Prix
-0.47
Formula
-0.47
Gates
-0.46
POSITIVE LOGITS
*/(
0.76
EStream
0.71
ymes
0.69
É
0.66
awaru
0.65
laughs
0.64
ngth
0.63
âĹ¼
0.63
ãĥ³ãĤ¸
0.62
INO
0.61
Activations Density 2.600%