INDEX
Explanations
phrases or words related to public or political speeches
references to public statements or comments made by individuals
New Auto-Interp
Negative Logits
ccording
-0.78
otype
-0.69
ntil
-0.64
Orange
-0.62
ramid
-0.61
Rescue
-0.60
rome
-0.60
rafted
-0.60
duct
-0.57
versely
-0.57
POSITIVE LOGITS
remarks
1.10
comments
0.87
æĥ
0.80
aloud
0.79
ä¹ĭ
0.77
slurs
0.77
goodbye
0.76
dispar
0.75
uttered
0.74
ault
0.74
Activations Density 0.021%