INDEX
Explanations
instances of the word "hard" and its variations related to effort and difficulty
New Auto-Interp
Negative Logits
ffect
-0.18
ìķ¼
-0.17
oretical
-0.17
ffects
-0.17
que
-0.17
incy
-0.16
едак
-0.16
atre
-0.15
usu
-0.15
gether
-0.15
POSITIVE LOGITS
ening
0.33
-core
0.26
cover
0.23
wig
0.23
ened
0.22
castle
0.21
working
0.21
/fast
0.20
liner
0.20
earned
0.20
Activations Density 0.038%