INDEX
Explanations
positive sentiment or requests
New Auto-Interp
Negative Logits
(
0.80
,
0.77
،
0.75
a
0.72
،
0.66
0.62
#
0.57
,
0.57
this
0.54
allere
0.52
POSITIVE LOGITS
ون
0.92
ის
0.77
and
0.75
на
0.75
ل
0.73
ق
0.72
ные
0.71
kan
0.71
ری
0.66
ور
0.66
Activations Density 7.189%