INDEX
Explanations
words related to stealth or hidden presence
words related to lurking and crouching actions
New Auto-Interp
Negative Logits
ocr
-0.69
biased
-0.68
Luther
-0.68
Samar
-0.65
ripp
-0.63
emale
-0.61
ractor
-0.60
regn
-0.58
ract
-0.56
bias
-0.56
POSITIVE LOGITS
ched
1.19
ches
1.09
kish
1.07
ching
1.05
chers
1.04
cks
1.02
ks
1.00
chery
1.00
ked
0.96
kers
0.95
Activations Density 0.078%