INDEX
Explanations
phrases related to physical constraints and potential hazards
New Auto-Interp
Negative Logits
ania
-0.16
lauf
-0.16
erli
-0.16
drains
-0.15
æ®ĸ
-0.15
ombs
-0.15
çĢ
-0.15
@js
-0.15
Fucked
-0.14
uten
-0.14
POSITIVE LOGITS
bunch
0.26
snag
0.25
rub
0.22
sag
0.22
rubbing
0.21
bul
0.21
Rub
0.19
cre
0.19
scratch
0.19
Restricted
0.18
Activations Density 0.109%