INDEX
Explanations
words or phrases indicating emotional responses or feelings
New Auto-Interp
Negative Logits
ensus
-0.62
onCancelled
-0.53
ため
-0.50
hdys
-0.49
texas
-0.49
response
-0.49
Kjelder
-0.47
israel
-0.46
bres
-0.45
☸
-0.45
POSITIVE LOGITS
quelcon
0.88
يتيمه
0.86
another
0.83
other
0.80
تانيه
0.76
subplots
0.75
Aiheesta
0.75
Revenir
0.72
tvrt
0.72
ďal
0.71
Activations Density 0.339%