INDEX
Explanations
verbs related to activities or judgments
words related to perception and understanding
New Auto-Interp
Negative Logits
hement
-0.77
hero
-0.61
avis
-0.60
andy
-0.59
toe
-0.57
arm
-0.57
esy
-0.57
ikers
-0.56
rene
-0.55
istration
-0.55
POSITIVE LOGITS
happening
0.93
happen
0.85
ABOUT
0.84
about
0.82
happens
0.75
happened
0.72
transpired
0.72
Means
0.71
Happ
0.70
next
0.69
Activations Density 0.196%