INDEX
Explanations
conversations centered around friendship and relationships
New Auto-Interp
Negative Logits
adolu
-0.16
_SUPPORT
-0.16
sah
-0.15
iyon
-0.15
meyi
-0.14
vyjád
-0.14
tir
-0.14
ilos
-0.14
dex
-0.14
ivet
-0.14
POSITIVE LOGITS
tell
0.47
tell
0.42
åijĬè¯ī
0.42
telling
0.41
tells
0.41
Tell
0.41
Tell
0.40
told
0.39
Tells
0.33
.tell
0.30
Activations Density 0.478%