INDEX
Explanations
pronouns followed by action verbs
the possessive form of "it."
New Auto-Interp
Negative Logits
aternal
-0.74
enment
-0.73
iaries
-0.72
veyard
-0.70
quila
-0.70
unda
-0.69
colored
-0.69
runners
-0.68
kin
-0.67
uns
-0.66
POSITIVE LOGITS
è£ħ
0.69
Ł
0.64
NETWORK
0.64
IST
0.63
SEA
0.62
AW
0.61
åĬ
0.61
$$$$
0.60
使
0.60
OFF
0.59
Activations Density 0.000%