INDEX
Explanations
references to specific actors and their roles in films
New Auto-Interp
Negative Logits
BCHP
-0.16
andle
-0.15
Glasgow
-0.14
ãĥĮ
-0.14
çĨĬ
-0.14
ryb
-0.14
Nx
-0.14
achten
-0.13
CPS
-0.13
azzi
-0.13
POSITIVE LOGITS
Iron
0.48
Stark
0.45
Tony
0.41
Iron
0.38
Tony
0.33
iron
0.32
IRON
0.31
Armor
0.30
Pepper
0.29
armour
0.29
Activations Density 0.013%