INDEX
Explanations
phrases related to social interaction and communication
phrases related to controversial topics and opinions
New Auto-Interp
Negative Logits
antine
-0.80
isoft
-0.72
iencies
-0.70
ongo
-0.70
MRI
-0.66
avorite
-0.65
abase
-0.65
wald
-0.64
unker
-0.63
brance
-0.62
POSITIVE LOGITS
"@
1.33
"'
1.23
"<
1.22
"#
1.22
"...
1.20
"â̦
1.20
"{1.13
"(
1.13
"%
1.11
"-
1.10
Activations Density 0.783%