INDEX
Explanations
adjectives that describe strength and significance
New Auto-Interp
Negative Logits
zelve
-0.70
aikaa
-0.70
mourut
-0.67
kaynağından
-0.66
photolibrary
-0.65
itſelf
-0.63
llorando
-0.62
apparti
-0.62
mukana
-0.62
bbene
-0.62
POSITIVE LOGITS
]))
0.71
simple
0.71
%"),
0.71
good
0.70
"])){0.66
']}
0.65
")));
0.64
}));
0.64
mild
0.62
]));
0.62
Activations Density 0.817%