INDEX
Explanations
descriptive labels/qualities
New Auto-Interp
Negative Logits
intolerable
0.46
素晴らしい
0.46
extraordinaria
0.45
extraordinary
0.45
অসাধারণ
0.44
unbearable
0.42
horrendous
0.41
素晴
0.41
horrifying
0.40
incredible
0.40
POSITIVE LOGITS
slightly
1.09
Slightly
0.95
slightly
0.92
upbeat
0.90
légèrement
0.86
playful
0.83
approachable
0.78
straightforward
0.75
breezy
0.74
somewhat
0.73
Activations Density 0.292%