INDEX
Explanations
relationships of blame or responsibility
phrases related to assigning blame or responsibility
New Auto-Interp
Negative Logits
strate
-0.79
vine
-0.78
osphere
-0.77
herer
-0.77
Layer
-0.76
izons
-0.75
Grid
-0.72
nets
-0.71
atories
-0.71
Byte
-0.71
POSITIVE LOGITS
sins
1.15
inconvenience
1.05
crimes
1.04
sake
0.96
deaths
0.93
offences
0.92
theft
0.92
absence
0.89
existence
0.89
transgress
0.89
Activations Density 0.253%