INDEX
Explanations
descriptive adjectives or states
New Auto-Interp
Negative Logits
5
0.57
Hin
0.55
५
0.53
Remark
0.50
Mention
0.48
remarkably
0.46
فظ
0.46
注目
0.45
Siehe
0.43
Dest
0.43
POSITIVE LOGITS
Sadler
0.54
Convention
0.53
Sadie
0.50
Theatre
0.49
Amsterdam
0.49
Noir
0.49
Muse
0.49
Toxicity
0.48
Pergamon
0.48
Instit
0.47
Activations Density 0.003%