INDEX
Explanations
expressions indicating ongoing or past states or conditions
New Auto-Interp
Negative Logits
Directed
-0.16
deaux
-0.15
categor
-0.15
Sharper
-0.15
Hoy
-0.15
eya
-0.15
arro
-0.14
Orient
-0.14
issued
-0.14
swick
-0.14
POSITIVE LOGITS
studied
0.22
explored
0.21
investigated
0.20
discussed
0.19
covered
0.18
known
0.18
thoroughly
0.18
documented
0.18
fully
0.17
examined
0.17
Activations Density 0.146%