INDEX
Explanations
references to actions or characteristics related to a specific group of people
references to the word "they" indicating a focus on people or groups being discussed
New Auto-Interp
Negative Logits
Eleven
-0.89
CCC
-0.85
Glob
-0.72
Tad
-0.70
Kinn
-0.70
UTC
-0.67
Globe
-0.67
Atlantis
-0.66
Cable
-0.66
Flan
-0.66
POSITIVE LOGITS
're
1.28
've
1.03
selves
1.02
'll
0.99
self
0.91
selves
0.89
themselves
0.89
own
0.87
perceive
0.85
deserve
0.85
Activations Density 0.229%