INDEX
Explanations
phrases that indicate limitations or restrictions in various contexts
New Auto-Interp
Negative Logits
/misc
-0.15
urd
-0.15
urch
-0.14
partly
-0.14
misc
-0.13
лон
-0.13
ittel
-0.13
Stall
-0.13
Ì£
-0.13
935
-0.12
POSITIVE LOGITS
only
0.72
ONLY
0.65
limited
0.64
only
0.62
restricted
0.58
Only
0.57
limited
0.57
ONLY
0.56
Only
0.56
confined
0.54
Activations Density 0.462%