INDEX
Explanations
expressing evaluation and strong states
New Auto-Interp
Negative Logits
شائع
0.36
("",0.36
allowable
0.34
Attempts
0.34
isValid
0.34
solves
0.33
obeys
0.33
চিৎ
0.33
controllability
0.33
समझा
0.32
POSITIVE LOGITS
intrigued
1.05
impressed
0.87
excited
0.86
astonished
0.85
thrilled
0.85
amazed
0.83
fascinated
0.83
annoyed
0.82
interested
0.80
skeptical
0.78
Activations Density 0.019%