INDEX
Explanations
ask, act, set, number, test
New Auto-Interp
Negative Logits
Казіно
0.50
¹.
0.50
».
0.48
®.
0.47
ㅔ
0.47
².
0.47
³.
0.47
**:
0.46
}}$.
0.46
◆
0.45
POSITIVE LOGITS
0.66
0.60
0.44
0.41
0.41
einander
0.39
%
0.39
,$
0.38
_
0.38
/
0.36
Activations Density 0.014%