INDEX
Explanations
information about a specific tool, possibly within a game
New Auto-Interp
Negative Logits
sth
-0.76
:]
-0.76
sole
-0.76
elim
-0.75
recomp
-0.67
mism
-0.66
aterasu
-0.65
remod
-0.63
phase
-0.63
abrupt
-0.63
POSITIVE LOGITS
Tumblr
1.31
1.25
1.22
Tumblr
1.22
1.18
1.16
1.15
1.15
1.14
1.13
Activations Density 0.731%