INDEX
Explanations
friend, every, urgency, tin
New Auto-Interp
Negative Logits
🧛
0.45
lumps
0.40
pts
0.39
kulit
0.38
skimming
0.38
stven
0.38
Peker
0.38
pemer
0.37
iç
0.37
spirals
0.36
POSITIVE LOGITS
friend
0.55
chance
0.54
nerve
0.54
Friend
0.49
Tin
0.49
TIN
0.46
friend
0.45
Friend
0.45
fear
0.44
cop
0.44
Activations Density 0.000%