INDEX
Explanations
phrases or quotations that express dialogue or statements
New Auto-Interp
Negative Logits
avra
-0.18
klad
-0.15
idla
-0.15
hoe
-0.15
bare
-0.14
اعÙĬ
-0.14
лий
-0.14
Spy
-0.14
usted
-0.14
adlo
-0.14
POSITIVE LOGITS
imenti
0.15
ame
0.15
according
0.15
says
0.15
.catch
0.15
catch
0.15
said
0.14
lied
0.14
ste
0.14
859
0.14
Activations Density 0.091%