INDEX
Explanations
phrases that express beliefs about moral or religious authority
New Auto-Interp
Negative Logits
arguably
-0.64
Ironically
-0.57
Ironically
-0.56
comprom
-0.55
ⓧ
-0.54
redefine
-0.54
UNSIGNED
-0.53
Segen
-0.52
caveats
-0.52
defy
-0.52
POSITIVE LOGITS
PreferredItem
0.81
AndEndTag
0.73
kollu
0.65
haustible
0.64
ItemBackground
0.60
Abitanti
0.60
TableBody
0.60
Ecotoxicity
0.57
]';
0.56
InputBorder
0.56
Activations Density 0.159%