INDEX
Explanations
words related to positivity or benefit
terms related to positive outcomes or advantages
New Auto-Interp
Negative Logits
Wolves
-0.71
Hun
-0.68
Rush
-0.68
buck
-0.67
pper
-0.66
Bur
-0.65
á
-0.65
Fever
-0.64
Barcl
-0.63
hani
-0.63
POSITIVE LOGITS
beneficial
0.90
icial
0.88
iciary
0.85
rative
0.78
destro
0.78
synerg
0.75
agre
0.74
tarian
0.74
chwitz
0.73
ritional
0.72
Activations Density 0.010%