INDEX
Explanations
words related to swindling or dishonesty
New Auto-Interp
Negative Logits
pora
-0.69
negligent
-0.67
cised
-0.66
onomy
-0.65
_-
-0.64
resting
-0.62
degrade
-0.61
flawed
-0.60
Engel
-0.59
Kubrick
-0.59
POSITIVE LOGITS
indle
1.13
imming
1.12
immers
1.10
anky
1.08
itched
1.06
addle
1.05
arf
1.03
itcher
1.02
inging
1.02
itching
1.01
Activations Density 0.012%