INDEX
Explanations
references to ownership or possession
New Auto-Interp
Negative Logits
atti
-0.17
lea
-0.16
pei
-0.14
lights
-0.14
acro
-0.14
ynes
-0.14
itudes
-0.14
learner
-0.13
Realt
-0.13
micro
-0.13
POSITIVE LOGITS
SELF
0.20
yourself
0.18
nger
0.17
opic
0.16
anmar
0.16
opia
0.15
ocu
0.15
ths
0.15
åĢij
0.15
guys
0.15
Activations Density 0.191%