INDEX
Explanations
phrases related to legal incidents and consequences
New Auto-Interp
Negative Logits
icz
-0.14
ERR
-0.14
earn
-0.14
quo
-0.14
dual
-0.13
Surveillance
-0.13
ATORS
-0.13
mor
-0.13
icao
-0.13
enny
-0.13
POSITIVE LOGITS
crime
0.16
Against
0.15
Crime
0.15
soft
0.15
wargs
0.14
Against
0.14
Crime
0.14
kiye
0.14
roe
0.14
Ign
0.14
Activations Density 0.346%