INDEX
Explanations
pronouns such as "them" and "her," especially when referring to specific actions or objects
New Auto-Interp
Negative Logits
âĢ¢âĢ¢
-0.66
Megan
-0.65
mire
-0.62
Lori
-0.61
Fulton
-0.60
Salon
-0.60
CCC
-0.60
LH
-0.58
Jon
-0.58
Laurie
-0.57
POSITIVE LOGITS
selves
1.75
atically
1.55
selves
1.49
atic
1.42
self
1.42
alian
0.94
atics
0.94
conduc
0.91
individually
0.85
zbollah
0.83
Activations Density 0.526%