INDEX
Explanations
sentences related to legal or criminal activities
New Auto-Interp
Negative Logits
adish
-0.66
dar
-0.65
ifter
-0.61
oufl
-0.61
Jou
-0.59
asp
-0.58
inas
-0.58
fare
-0.57
stall
-0.55
owler
-0.55
POSITIVE LOGITS
thereto
1.40
to
1.03
entious
0.89
To
0.84
unto
0.84
itionally
0.81
itiz
0.79
ences
0.77
sov
0.73
to
0.72
Activations Density 2.781%