INDEX
Explanations
instances of characters observing or interacting with their environment
New Auto-Interp
Negative Logits
otte
-0.18
otta
-0.16
@student
-0.15
ãģ£ãģ±ãģĦ
-0.15
edad
-0.15
erti
-0.15
åľ
-0.15
olean
-0.15
lator
-0.14
iscrimination
-0.14
POSITIVE LOGITS
/window
0.16
Ard
0.15
ild
0.15
ILD
0.15
uder
0.15
window
0.14
Bail
0.14
509
0.14
cre
0.14
ãĥ³ãĥĦ
0.14
Activations Density 0.107%