INDEX
Explanations
topics related to legal cases, political figures, and controversial issues
topics related to social and political issues
New Auto-Interp
Negative Logits
comr
-0.74
mathemat
-0.66
incap
-0.66
ailability
-0.64
ITNESS
-0.61
omever
-0.58
princ
-0.57
secondly
-0.57
ModLoader
-0.57
orically
-0.56
POSITIVE LOGITS
Replay
1.18
}}
0.85
<|endoftext|>
0.83
}
0.77
ï
0.77
|
0.76
»
0.74
}}
0.74
>]
0.74
]
0.73
Activations Density 0.182%