INDEX
Explanations
verbs related to actions performed or processes carried out
New Auto-Interp
Negative Logits
assi
-0.66
rium
-0.64
Flavoring
-0.62
TON
-0.61
akia
-0.59
yip
-0.59
Rahman
-0.58
recruitment
-0.56
Fas
-0.55
Nguyen
-0.54
POSITIVE LOGITS
differently
1.19
elsewhere
1.03
solely
1.03
concurrently
1.03
correctly
1.01
incorrectly
0.99
abroad
0.97
exclusively
0.97
indoors
0.96
cheaply
0.95
Activations Density 0.203%