INDEX
Explanations
interactions and relationships among family members
New Auto-Interp
Negative Logits
fsp
-0.17
erule
-0.16
arel
-0.15
etik
-0.15
&o
-0.14
allen
-0.14
illis
-0.14
rious
-0.13
riere
-0.13
inar
-0.13
POSITIVE LOGITS
convince
0.52
persu
0.50
convincing
0.49
persuade
0.49
convin
0.47
persuasion
0.46
persuaded
0.44
convinced
0.43
persuasive
0.43
pressure
0.42
Activations Density 0.595%