INDEX
Explanations
references to specific entities or groups of people within various contexts
references to various groups or collective entities, particularly focusing on their characteristics or actions
New Auto-Interp
Negative Logits
Thompson
-0.66
Sweeney
-0.65
llo
-0.65
HuffPost
-0.63
storm
-0.61
Sau
-0.60
aneously
-0.60
oslav
-0.57
Mann
-0.57
024
-0.56
POSITIVE LOGITS
hip
1.34
ilver
1.22
omething
1.20
chool
1.17
kaya
1.17
pite
1.14
aurus
1.14
mith
1.13
erver
1.13
ullivan
1.12
Activations Density 0.559%