INDEX
Explanations
phrases related to campaigns or initiatives
references to specific campaigns or initiatives, particularly those related to societal issues
New Auto-Interp
Negative Logits
Naz
-0.66
cruelty
-0.61
casualties
-0.59
labels
-0.59
spo
-0.59
refuge
-0.58
Integrity
-0.57
details
-0.57
frames
-0.57
spoilers
-0.57
POSITIVE LOGITS
arter
4.80
arters
2.60
arts
1.29
ART
1.17
arty
1.13
eret
1.06
RNA
1.03
arth
0.99
arted
0.99
amber
0.95
Activations Density 0.012%