INDEX
Explanations
references to the physical structure of objects or containers, particularly focusing on dimensions and internal features
New Auto-Interp
Negative Logits
roma
-0.15
stra
-0.15
ames
-0.15
idden
-0.15
ome
-0.14
aren
-0.14
identity
-0.14
uel
-0.14
emen
-0.14
ison
-0.13
POSITIVE LOGITS
ALLED
0.16
Hurt
0.15
ACHE
0.15
webs
0.14
ERM
0.14
ocode
0.14
decay
0.14
teri
0.14
/out
0.14
IRCLE
0.14
Activations Density 0.043%