INDEX
Explanations
references to 'sweat' and related terms
New Auto-Interp
Negative Logits
xt
-0.18
mit
-0.18
men
-0.18
ric
-0.18
ne
-0.18
nc
-0.18
ctor
-0.17
so
-0.17
ways
-0.17
reo
-0.16
POSITIVE LOGITS
Swe
0.23
eter
0.23
swe
0.19
eper
0.19
eters
0.19
etch
0.18
pps
0.18
itzer
0.18
instein
0.18
stakes
0.18
Activations Density 0.012%