INDEX
Explanations
phrases related to statements or quotes in conversation
negative phrases or sentiments related to inability or unfulfilled desires
New Auto-Interp
Negative Logits
accompan
-0.65
Discussion
-0.61
Ezek
-0.58
VL
-0.56
Category
-0.55
instead
-0.55
rather
-0.55
itaire
-0.55
renheit
-0.54
artney
-0.54
POSITIVE LOGITS
anymore
1.14
nor
0.83
yet
0.73
hin
0.69
necessarily
0.68
':
0.65
\'
0.61
shit
0.61
:(
0.60
bothered
0.60
Activations Density 0.335%