INDEX
Explanations
pronouns and related words
pronouns and verbs that refer to entities or groups
New Auto-Interp
Negative Logits
Militia
-0.71
RTX
-0.70
Care
-0.69
Kinn
-0.68
Lou
-0.68
OL
-0.66
CCC
-0.66
Karn
-0.65
LOT
-0.65
Passenger
-0.64
POSITIVE LOGITS
contained
0.95
involve
0.90
intersect
0.89
occur
0.88
involves
0.88
contain
0.87
originate
0.87
occurring
0.86
coincide
0.85
arising
0.84
Activations Density 0.815%