INDEX
Explanations
references to people and their interactions
New Auto-Interp
Negative Logits
blink
-0.17
anto
-0.15
nat
-0.15
achi
-0.15
eyin
-0.14
eddar
-0.14
eyer
-0.14
ve
-0.14
AX
-0.14
igin
-0.14
POSITIVE LOGITS
/us
0.16
LineStyle
0.16
rằng
0.16
how
0.15
lok
0.15
bahwa
0.14
mine
0.14
(IM
0.13
aires
0.13
how
0.13
Activations Density 0.091%