INDEX
Explanations
phrases related to neglectful behavior and potentially harmful situations
New Auto-Interp
Negative Logits
CHAT
-0.74
VIDE
-0.74
Streamer
-0.70
SW
-0.63
Rated
-0.63
soType
-0.62
isse
-0.61
Flo
-0.60
night
-0.60
Aires
-0.60
POSITIVE LOGITS
ful
0.97
fully
0.88
fulness
0.83
FUL
0.78
shire
0.75
icit
0.74
reatment
0.73
WARE
0.73
luster
0.72
mental
0.71
Activations Density 0.046%