INDEX
Explanations
phrases indicating performance or outcome assessment, particularly focusing on whether something is performing well or not
expressions of performance or effectiveness
New Auto-Interp
Negative Logits
adena
-0.79
ategory
-0.78
ory
-0.76
ruce
-0.72
İĭ
-0.72
hyde
-0.71
ules
-0.70
atto
-0.69
orical
-0.66
mitting
-0.65
POSITIVE LOGITS
enough
1.07
enough
0.99
esley
0.79
Enough
0.77
behaved
0.77
espie
0.73
suited
0.71
liked
0.70
baum
0.68
alright
0.66
Activations Density 0.031%