INDEX
Explanations
sentences describing people's actions or states
phrases indicating the presence and actions of people
New Auto-Interp
Negative Logits
xxxx
-0.76
predecessor
-0.69
achment
-0.68
TX
-0.65
srfAttach
-0.63
imental
-0.63
ONSORED
-0.63
verse
-0.63
saga
-0.62
fiasco
-0.61
POSITIVE LOGITS
clam
1.09
understandably
0.95
ateurs
0.95
alike
0.93
accustomed
0.91
beware
0.90
routinely
0.90
eager
0.87
everywhere
0.84
encouraged
0.83
Activations Density 0.388%