INDEX
Explanations
concepts related to crime and harm analysis
New Auto-Interp
Negative Logits
elas
-0.15
enet
-0.15
oice
-0.14
raith
-0.14
legen
-0.14
enso
-0.14
bbbb
-0.13
magna
-0.13
kil
-0.13
schemas
-0.13
POSITIVE LOGITS
ãĥį
0.15
ibr
0.15
ÏĨÏħ
0.14
argument
0.13
utor
0.13
åľŃ
0.13
amo
0.13
ioni
0.13
à¤Ĥà¤ļ
0.13
opr
0.13
Activations Density 0.058%