INDEX
Explanations
locations mentioned in text
New Auto-Interp
Negative Logits
ãĤ¡
-0.74
Bulg
-0.73
Isles
-0.70
friction
-0.69
Hera
-0.66
terday
-0.64
fortune
-0.62
Lerner
-0.62
Pose
-0.61
slurs
-0.61
POSITIVE LOGITS
etermin
1.29
etermination
1.28
ownt
1.28
aughters
1.19
etermined
1.17
wayne
1.16
imensional
1.15
REAM
1.14
izzy
1.14
azz
1.13
Activations Density 0.037%