INDEX
Explanations
intensifier for strong adjectives
New Auto-Interp
Negative Logits
Absolute
0.86
Absolute
0.80
2
0.77
absolute
0.75
absolute
0.71
абсолю
0.68
ও
0.64
DI
0.64
:
0.64
۲
0.59
POSITIVE LOGITS
ين
0.90
ام
0.89
ной
0.86
ли
0.85
ن
0.80
v
0.79
ва
0.77
vq
0.73
ів
0.73
ку
0.73
Activations Density 0.003%