INDEX
Explanations
names or references to individuals involved in conversations
New Auto-Interp
Negative Logits
379
-0.16
ivet
-0.15
ension
-0.14
laughter
-0.14
TextAlign
-0.13
arme
-0.13
STANCE
-0.13
wyn
-0.13
Outcome
-0.13
noon
-0.13
POSITIVE LOGITS
bies
0.16
aukee
0.15
ç¥
0.14
/or
0.13
gep
0.13
yon
0.13
yles
0.13
ãĤĪãģı
0.12
arkadaÅŁ
0.12
Pavilion
0.12
Activations Density 0.033%