INDEX
Explanations
verbs related to the consequences of actions
words related to legal consequences or outcomes
New Auto-Interp
Negative Logits
pedia
-0.72
Accountability
-0.60
antry
-0.58
Roof
-0.57
yp
-0.57
Beer
-0.56
ACP
-0.54
Hom
-0.54
Carney
-0.53
unsupported
-0.53
POSITIVE LOGITS
ues
1.13
ens
1.07
ilage
1.04
ue
0.97
ued
0.97
uers
0.95
oing
0.86
uing
0.86
uer
0.82
ailed
0.81
Activations Density 0.036%