INDEX
Explanations
phrases indicating personal reflections or feelings about experiences
New Auto-Interp
Negative Logits
auen
-0.15
tü
-0.14
veis
-0.14
γοÏħ
-0.14
apiro
-0.14
ylland
-0.14
ious
-0.13
ruba
-0.13
tx
-0.13
mî
-0.13
POSITIVE LOGITS
us
0.22
many
0.20
him
0.15
us
0.15
many
0.15
меÑĩ
0.15
asm
0.15
eyer
0.15
arga
0.15
ActionCreators
0.15
Activations Density 0.105%