INDEX
Explanations
states or descriptions after 'be' verbs
New Auto-Interp
Negative Logits
Suppression
0.42
सिने
0.41
untitled
0.40
Driven
0.40
ሠ
0.39
وسی
0.38
Privilege
0.38
synonyms
0.38
jad
0.38
вари
0.37
POSITIVE LOGITS
insistent
0.84
insisting
0.73
adamant
0.71
complaining
0.69
gracious
0.66
insist
0.65
abusive
0.64
furious
0.64
unusually
0.61
visibly
0.60
Activations Density 0.011%