INDEX
Explanations
locations or settings and physical actions happening within those locations
New Auto-Interp
Negative Logits
eatures
-0.61
resid
-0.59
External
-0.58
versatility
-0.58
FORMATION
-0.58
ŃĶ
-0.57
unspecified
-0.57
complementary
-0.56
çͰ
-0.55
newcom
-0.55
POSITIVE LOGITS
anymore
1.33
someday
1.02
anytime
0.94
or
0.92
tomorrow
0.90
nor
0.90
forever
0.78
sooner
0.76
.?
0.74
?",
0.73
Activations Density 0.844%