INDEX
Explanations
offering help and asking what to do
New Auto-Interp
Negative Logits
bullshit
0.49
emocional
0.47
fuck
0.46
warnings
0.46
fís
0.45
fucking
0.44
remedial
0.42
emotionally
0.42
べき
0.42
warns
0.41
POSITIVE LOGITS
exciting
0.70
😊
0.66
楽しい
0.64
wonderful
0.63
愉快
0.62
😊
0.61
素敵な
0.61
groovy
0.60
:)
0.59
awesome
0.58
Activations Density 0.078%