INDEX
Explanations
describing physical states or actions
New Auto-Interp
Negative Logits
!";
0.65
!';
0.63
!;
0.62
!!!!
0.56
!!!!!
0.55
!!!!!!
0.55
!!!!
0.54
!!!!!
0.51
!".
0.50
!!!!!!!
0.50
POSITIVE LOGITS
despite
0.54
amidst
0.47
Beside
0.44
beside
0.44
رغم
0.42
impatiently
0.42
instinctively
0.42
while
0.41
这才
0.41
whilst
0.41
Activations Density 0.108%