INDEX
Explanations
sentences with a strong emphasis on prohibition or warning against certain actions
phrases indicating conditions or restrictions related to actions or behaviors
New Auto-Interp
Negative Logits
Voting
-0.68
Borough
-0.64
Room
-0.61
boarding
-0.60
Eye
-0.60
Meadow
-0.60
hips
-0.59
nerv
-0.59
Cards
-0.59
Upper
-0.58
POSITIVE LOGITS
violates
1.09
occurs
1.05
involves
1.04
contradicts
0.99
satisfies
0.97
accompanies
0.95
exceeds
0.94
resembles
0.94
contains
0.93
entails
0.93
Activations Density 0.179%