INDEX
Explanations
references to excessive quantities or amounts
New Auto-Interp
Negative Logits
byn
-0.91
saf
-0.89
selves
-0.89
ologies
-0.88
Ń·
-0.86
ands
-0.81
estones
-0.79
ospels
-0.79
ĪĴ
-0.77
İĭ
-0.76
POSITIVE LOGITS
emphasis
1.05
attention
0.96
firepower
0.93
baggage
0.88
negativity
0.88
inconsistency
0.86
concentration
0.84
temptation
0.84
exposure
0.83
fuss
0.83
Activations Density 0.023%