INDEX
Explanations
concepts and negative feelings
New Auto-Interp
Negative Logits
bulunmaktadır
1.15
אשר
1.11
तथा
1.06
Furthermore
1.01
található
0.98
amelyet
0.97
Additionally
0.93
данной
0.92
poiché
0.91
Unable
0.91
POSITIVE LOGITS
hurts
1.46
feels
1.38
kinda
1.31
sucks
1.28
sure
1.26
=
1.25
gets
1.25
hates
1.24
got
1.24
felt
1.23
Activations Density 0.355%