INDEX
Explanations
instances of personal pronouns and their associated actions or states
New Auto-Interp
Negative Logits
certainly
-0.08
ubo
-0.07
riangle
-0.07
oit
-0.07
adalah
-0.07
otts
-0.07
anter
-0.07
ashtra
-0.07
ussen
-0.06
however
-0.06
POSITIVE LOGITS
upon
0.08
slightest
0.07
uzz
0.07
ÅĻad
0.06
suddenly
0.06
šet
0.06
gu
0.06
uz
0.06
aging
0.06
tiener
0.06
Activations Density 0.026%