INDEX
Explanations
clickable text prompts
phrases that include the word "Click" indicating interactions or actions related to links or buttons
New Auto-Interp
Negative Logits
utive
-0.80
Scotia
-0.73
Chancellor
-0.68
Croat
-0.65
ASC
-0.62
VAT
-0.61
val
-0.60
OUN
-0.59
Warsaw
-0.59
tracts
-0.57
POSITIVE LOGITS
prints
0.81
Click
0.80
click
0.78
Sign
0.76
lish
0.75
whe
0.71
er
0.71
esome
0.70
ety
0.70
tips
0.70
Activations Density 0.013%