INDEX
Explanations
phrases indicating continuation or persistence over time
New Auto-Interp
Negative Logits
itsu
-0.17
nj
-0.15
Phill
-0.15
ehr
-0.14
anner
-0.14
2
-0.14
loses
-0.14
Ri
-0.14
137
-0.14
alt
-0.14
POSITIVE LOGITS
alive
0.47
alive
0.40
Alive
0.34
active
0.33
Alive
0.32
_alive
0.27
-active
0.26
ACTIVE
0.26
active
0.26
_ACTIVE
0.26
Activations Density 0.100%