INDEX
Explanations
phrases where the concept of 'being fine' or 'acceptable' is expressed
expressions of approval or adequacy
New Auto-Interp
Negative Logits
riber
-0.77
rush
-0.73
orical
-0.71
Kut
-0.70
iq
-0.70
Hack
-0.70
nikov
-0.69
raped
-0.69
ulhu
-0.69
erity
-0.67
POSITIVE LOGITS
tuning
1.02
tuned
0.94
Gael
0.83
tune
0.78
fine
0.74
Fine
0.74
Haven
0.71
fine
0.71
Fine
0.70
baum
0.70
Activations Density 0.010%