INDEX
Explanations
interactions and dialogues involving family members and personal relationships
New Auto-Interp
Negative Logits
vermag
-0.65
obtaining
-0.63
occurring
-0.62
residing
-0.62
אשר
-0.62
possessing
-0.61
sahiptir
-0.61
utilising
-0.60
viewing
-0.59
pertanto
-0.59
POSITIVE LOGITS
said
0.76
fucked
0.69
freaked
0.67
told
0.67
got
0.65
SAID
0.64
talked
0.63
kinda
0.63
figured
0.63
thought
0.62
Activations Density 0.374%