INDEX
Explanations
political and ethical terms and concepts
New Auto-Interp
Negative Logits
verbs
-0.83
ku
-0.66
door
-0.63
paren
-0.62
ramid
-0.60
ammy
-0.60
gap
-0.59
packed
-0.59
activation
-0.59
cell
-0.59
POSITIVE LOGITS
ments
0.89
entimes
0.89
hower
0.79
ocument
0.77
é¾įåĸļ士
0.76
tainment
0.72
ĸļ
0.71
eenth
0.70
ufact
0.70
mares
0.70
Activations Density 3.271%