INDEX
Explanations
instances of the word "I" and expressions of personal opinions or experiences
New Auto-Interp
Negative Logits
Ŀ
-0.17
opic
-0.16
338
-0.15
-anchor
-0.14
Gy
-0.14
ãĥ«ãĥī
-0.14
discrimination
-0.14
Rodney
-0.14
gy
-0.14
Mac
-0.14
POSITIVE LOGITS
PLL
0.33
PLL
0.29
Tro
0.26
pll
0.25
Pretty
0.25
Tro
0.24
pll
0.24
tro
0.23
Pretty
0.22
Hanna
0.21
Activations Density 0.002%