INDEX
Explanations
references to agents or entities in various contexts, often highlighting actions or states related to them
New Auto-Interp
Negative Logits
iples
-0.13
thÃŃch
-0.13
QualifiedName
-0.13
.rl
-0.13
iko
-0.13
(*)(
-0.13
ÙĤÙĪÙĦ
-0.13
ateg
-0.13
.opts
-0.13
éħ
-0.13
POSITIVE LOGITS
able
0.72
ability
0.68
èĥ½å¤Ł
0.58
Ability
0.57
Able
0.56
Ability
0.51
èĥ½
0.48
èĥ½
0.45
ability
0.44
abilities
0.39
Activations Density 0.368%