INDEX
Explanations
mentions of the term "Anti" followed by a single word
references to anti-related themes or concepts
New Auto-Interp
Negative Logits
hus
-0.78
constrained
-0.65
tantal
-0.64
peek
-0.64
alus
-0.64
calcul
-0.63
deliberations
-0.62
fortunate
-0.62
snapping
-0.61
rows
-0.61
POSITIVE LOGITS
Anti
3.57
Anti
2.64
anti
1.56
anti
1.51
Ant
1.29
Hate
1.23
Counter
1.15
antid
1.15
Radical
1.13
Corruption
1.09
Activations Density 0.013%