INDEX
Explanations
actions related to prevention or prohibition
phrases that indicate prevention or avoidance of certain actions or conditions
New Auto-Interp
Negative Logits
ratulations
-0.87
quote
-0.75
rongh
-0.74
partName
-0.73
isode
-0.72
mask
-0.72
starter
-0.70
ios
-0.70
width
-0.70
hai
-0.69
POSITIVE LOGITS
afar
1.29
whence
1.09
thence
1.00
scratch
0.98
anywhere
0.86
inside
0.84
abroad
0.84
Brune
0.79
elsewhere
0.77
wherever
0.76
Activations Density 0.208%