INDEX
Explanations
occurrences of pronouns referring to people or entities
pronoun followed by verb
New Auto-Interp
Negative Logits
autorytatywna
-0.83
ChildScrollView
-0.79
征詢我
-0.72
twimg
-0.71
gypte
-0.68
featureID
-0.68
ProtoMessage
-0.67
aarrggbb
-0.65
ddelwed
-0.65
KEYCODE
-0.64
POSITIVE LOGITS
The
0.61
It
0.54
Is
0.53
In
0.52
It
0.50
We
0.47
The
0.47
This
0.47
Is
0.46
He
0.46
Activations Density 0.176%