INDEX
Explanations
the short honest key answer
New Auto-Interp
Negative Logits
Extensive
0.63
Once
0.62
Highly
0.62
wówczas
0.61
Objectives
0.59
منذ
0.59
Considerable
0.58
highly
0.57
thereafter
0.56
sämt
0.56
POSITIVE LOGITS
key
1.98
beauty
1.57
key
1.50
truth
1.50
KEY
1.46
关键
1.44
thing
1.43
trick
1.42
clave
1.39
point
1.37
Activations Density 0.141%