INDEX
Explanations
affirmative conversational responses
New Auto-Interp
Negative Logits
тернет
0.75
ującego
0.69
hendak
0.66
イス
0.65
ప్ర
0.65
enes
0.64
ಾರ
0.64
ující
0.64
es
0.63
ogram
0.63
POSITIVE LOGITS
haha
1.12
true
1.00
thats
1.00
noticed
0.94
but
0.94
apt
0.93
understandable
0.93
heard
0.92
sorry
0.91
guilt
0.90
Activations Density 0.015%