INDEX
Explanations
references to personal abilities and capacities
New Auto-Interp
Negative Logits
usi
-0.15
aji
-0.14
bÄĽ
-0.14
isel
-0.14
渡
-0.14
cél
-0.14
Lots
-0.13
ilarity
-0.13
berman
-0.13
vangst
-0.13
POSITIVE LOGITS
luxury
0.31
means
0.30
lux
0.28
tools
0.25
inclination
0.24
means
0.23
Luxury
0.23
necessary
0.23
wh
0.22
ability
0.22
Activations Density 0.088%