INDEX
Explanations
phrases related to personal experiences and emotional reactions
New Auto-Interp
Negative Logits
αÏĥ
-0.17
atak
-0.16
ifax
-0.15
Tik
-0.15
spir
-0.15
unik
-0.14
serter
-0.14
iran
-0.14
onom
-0.14
helm
-0.14
POSITIVE LOGITS
rys
0.17
OPS
0.15
tow
0.15
Fernandez
0.14
crowned
0.14
pit
0.14
598
0.14
HoÃłng
0.14
apes
0.14
Blanch
0.14
Activations Density 0.586%