INDEX
Explanations
mentions of the word "Hard" at relatively high activation levels
New Auto-Interp
Negative Logits
uality
-0.56
allery
-0.47
umbn
-0.46
ĸļ
-0.45
uations
-0.44
Emir
-0.43
oration
-0.41
Shutterstock
-0.40
Mens
-0.40
orative
-0.39
POSITIVE LOGITS
ened
0.64
ness
0.56
ball
0.53
core
0.52
iness
0.52
iest
0.51
ening
0.51
Reply
0.50
ware
0.49
castle
0.48
Activations Density 16.836%