INDEX
Explanations
mentions of a specific location or entity
New Auto-Interp
Negative Logits
iros
-0.17
pta
-0.15
ä¹IJ
-0.14
iage
-0.14
InputBorder
-0.14
wr
-0.14
reff
-0.14
Gladiator
-0.14
Chin
-0.14
bod
-0.14
POSITIVE LOGITS
uth
0.29
UTH
0.23
wich
0.21
les
0.19
oxetine
0.19
ces
0.17
umb
0.15
quer
0.15
ude
0.15
ux
0.14
Activations Density 0.003%