INDEX
Explanations
reported speech and statements made by individuals
New Auto-Interp
Negative Logits
2020
-0.85
Pont
-0.82
tumblr
-0.79
ãĥ¯
-0.78
ãĥķãĤ¡
-0.76
EEE
-0.76
otype
-0.76
idelines
-0.72
Ranked
-0.72
ãĥīãĥ©ãĤ´ãĥ³
-0.72
POSITIVE LOGITS
she
0.86
afterward
0.80
he
0.79
they
0.77
afterwards
0.76
cops
0.76
goodbye
0.76
harrowing
0.75
ordeal
0.74
witnesses
0.73
Activations Density 0.129%