INDEX
Explanations
phrases related to different types of fences
words related to legal or ethical breaches
New Auto-Interp
Negative Logits
artif
-0.88
vulner
-0.81
reflex
-0.73
bun
-0.68
sugg
-0.67
Seym
-0.67
Eston
-0.66
misunder
-0.66
metic
-0.65
Assy
-0.65
POSITIVE LOGITS
cffffcc
1.17
ï¸ı
1.01
âĶĢâĶĢ
0.99
mad
0.94
talk
0.93
\-
0.90
clear
0.89
null
0.88
sure
0.88
closure
0.88
Activations Density 0.136%