INDEX
Explanations
verbs related to change or transformation
words related to dominance or control in different contexts
New Auto-Interp
Negative Logits
20439
-0.68
Killer
-0.67
pedia
-0.66
Beer
-0.65
Rated
-0.62
commons
-0.61
examples
-0.60
Responsibility
-0.59
Imam
-0.59
Carney
-0.59
POSITIVE LOGITS
ens
1.21
sembly
0.94
oing
0.82
ilage
0.81
prise
0.80
uit
0.80
ailing
0.79
uel
0.78
chwitz
0.77
aturated
0.77
Activations Density 0.018%