INDEX
Explanations
phrases related to accusations and their impacts on individuals
New Auto-Interp
Negative Logits
don
-0.52
don
-0.47
DON
-0.45
do
-0.44
Don
-0.44
Don
-0.44
DON
-0.43
RectangleBorder
-0.43
Do
-0.42
ChildScrollView
-0.42
POSITIVE LOGITS
did
1.87
Did
1.81
did
1.77
Did
1.77
didn
1.59
DID
1.56
DID
1.45
didn
1.44
didnt
1.41
Didn
1.41
Activations Density 0.887%