INDEX
Explanations
phrases indicating discomfort or distress
New Auto-Interp
Negative Logits
genannt
-0.54
hiva
-0.53
purpure
-0.53
balah
-0.52
felizmente
-0.51
WRENCE
-0.50
nito
-0.50
wla
-0.50
péché
-0.50
hamshire
-0.47
POSITIVE LOGITS
Numerade
0.67
للاسماء
0.67
fjspx
0.61
".
0.59
unknownFields
0.58
SourceChecksum
0.58
sensed
0.57
noticing
0.56
suspicious
0.56
ніципа
0.56
Activations Density 0.335%