INDEX
Explanations
phrases related to legal and regulatory language
symbols and punctuation associated with expressions of opinion
New Auto-Interp
Negative Logits
wagen
-0.72
conduc
-0.71
destro
-0.69
creen
-0.68
mosqu
-0.68
terday
-0.67
Drawn
-0.66
Dupl
-0.65
grips
-0.64
Belg
-0.64
POSITIVE LOGITS
ª
1.01
most
1.00
Ĵ
0.94
¹
0.94
ij
0.89
ł
0.89
actual
0.88
¼
0.87
option
0.87
race
0.87
Activations Density 0.078%