INDEX
Explanations
words related to illegality or illegal actions
New Auto-Interp
Negative Logits
็จ
-0.81
𝘇
-0.75
kast
-0.73
Kast
-0.71
Gump
-0.71
CELLANEOUS
-0.71
Waray
-0.69
envies
-0.68
ValueStyle
-0.66
Rost
-0.66
POSITIVE LOGITS
illeg
1.08
Illegal
1.00
illegal
0.97
illegal
0.96
Illegal
0.88
ilegal
0.88
illegally
0.86
gills
0.85
ileg
0.85
LEGAL
0.82
Activations Density 0.006%