INDEX
Explanations
explaining how things operate or are structured
New Auto-Interp
Negative Logits
awful
0.39
horrible
0.38
terrible
0.33
hurting
0.31
неуда
0.30
fainting
0.30
失败
0.30
dreadful
0.30
losers
0.29
crappy
0.29
POSITIVE LOGITS
overseen
0.45
headquartered
0.43
underpinned
0.43
actively
0.39
governed
0.39
operates
0.36
supplemented
0.36
intricately
0.36
routinely
0.35
wholly
0.35
Activations Density 0.000%