INDEX
Explanations
references to individuals who have been convicted of crimes
references to individuals who have been convicted of crimes
New Auto-Interp
Negative Logits
mentation
-0.86
mented
-0.80
earable
-0.75
aven
-0.71
eah
-0.71
yip
-0.69
erved
-0.69
idge
-0.67
hare
-0.66
agnetic
-0.66
POSITIVE LOGITS
felon
1.03
icts
0.79
convict
0.73
rupt
0.73
guilty
0.72
fel
0.72
llor
0.72
rehabilit
0.69
iary
0.68
perjury
0.68
Activations Density 0.047%