INDEX
Explanations
phrases indicating blame or responsibility
statements assigning blame or responsibility to specific entities or individuals
New Auto-Interp
Negative Logits
arious
-0.83
cit
-0.82
obi
-0.77
dayName
-0.76
itsu
-0.75
alli
-0.75
yssey
-0.71
Dialogue
-0.71
Boo
-0.71
atar
-0.70
POSITIVE LOGITS
ruining
1.43
causing
1.30
creating
1.29
ensuring
1.22
inciting
1.20
provoking
1.19
spreading
1.19
destroying
1.19
initiating
1.19
bringing
1.18
Activations Density 0.133%