INDEX
Explanations
events related to punishment and disciplinary actions
New Auto-Interp
Negative Logits
xed
-0.18
kud
-0.16
planet
-0.15
Platt
-0.14
rael
-0.14
bud
-0.14
spir
-0.14
planet
-0.14
ighth
-0.14
aland
-0.14
POSITIVE LOGITS
//{{0.16
breeds
0.15
iki
0.14
Tent
0.14
zig
0.14
езд
0.13
Verified
0.13
Fro
0.13
Hanging
0.13
Spo
0.13
Activations Density 0.326%